PHP 5.6 default_charset change may break HTML output

Reading Time: 4 Minutes

An important note for everyone who’s upgrading from PHP 5.4 and PHP 5.5, to PHP 5.6: the PHP default_charset in php.ini changed from “empty” to UTF-8, making UTF-8 the default charset in PHP. This may break HTML output if you try to set a different charset in your HTML head. It may also break functions like htmlentities() and htmlspecialchars. For example:

Breaking HTML output with PHP 5.6 default_charset changed to UTF-8

Suppose you have the following lines in PHP code (don’t ask me why you would want to do this…):

<html> 
<head> 
<meta http-equiv="content-type"
  content="text/html; charset=ISO-8859-1"> 
</head>
<body> 
<?php echo "éééééeeeeééé"; ?>
</body> 
</html>

The PHP default charset setting in PHP 5.4 and PHP 5.5 prints the expected string on the screen: éééééeeeeééé. With PHP 5.6 however, the default_charset is set to UTF-8, changing PHP’s default charset, and PHP 5.6 will always print a Content-Type response header set to UTF-8:

Content-Type: text/html; charset=UTF-8

This PHP default charset set to UTF-8 breaks HTML output and functions like htmlentities() / htmlspecialchars() (PHP bug #61354), because the HTML charset ISO-8859-1 is printed too (double Content-Type response header):

GET -uUsSed http://www.example.com/echo.php
GET http://www.example.com/echo.php
User-Agent: lwp-request/2.07

GET http://www.example.com/echo.php --> 200 OK
Cache-Control: private
Date: Tue, 31 Mar 2015 12:03:50 GMT
Server: Microsoft-IIS/8.0
Content-Length: 141
Content-Type: text/html; charset=UTF-8
Content-Type: text/html; charset=ISO-8859-1
Client-Date: Tue, 31 Mar 2015 12:03:50 GMT
Client-Peer: 77.94.251.244:80
Client-Response-Num: 1

Pro Tip: While going through your PHP config, fix & set correct values for curl.cainfo and openssl.cafile too. Don’t turn off CURLOPT_SSL_VERIFYPEER (or sslverify = false for wp_remote_get() in WordPress).

Interesting:   Minify WP-Super-Cache HTML cache files: WPSCMin a WP-Super-Cache plugin

PHP.net writes:

If omitted, the default value of the encoding varies depending on the PHP version in use. In PHP 5.6 and later, the default_charset configuration option is used as the default value. PHP 5.4 and 5.5 will use UTF-8 as the default. Earlier versions of PHP use ISO-8859-1.

Although this argument is technically optional, you are highly encouraged to specify the correct value for your code if you are using PHP 5.5 or earlier, or if your default_charset configuration option may be set incorrectly for the given input.

Fixing the HTML output in PHP 5.6

The most obvious solution to this problem is: don’t set a character set encoding in your HTML meta tag, e.g:

<meta http-equiv="content-type"
  content="text/html; charset=ISO-8859-1">

User-defined php.ini to overrule default_charset

PHP supports user-defined php.ini files, in which you can overrule some php.ini settings. Upload a user-defined php.ini to your webroot containing the following line:

default_charset = ""

This will make PHP to not send a Content-Type response header set to UTF-8.

Overrule default_charset with ini_set()

And last but not least, you can overrule this setting with PHP’s ini_set() function:

ini_set(default_charset, "");

3 Replies to “PHP 5.6 default_charset change may break HTML output”

  1. Thank you, this saved my day, fixing an old site suddenly full of questionmarks after upgrading php. All your other website performance articles are also very helpful!




    0



    0
  2. In PHP 7.0.19, the empty value of the default_charset directive (default_charset = “”) causes an error HTTP 500 “Internal Server Error”.
    You can solve this problem by setting the value to “none”. For example:
    php_value default_charset = none
    or
    php_value default_charset = “none”
    or
    php_value default_charset = ‘none’




    0



    0

Hi! Join the discussion, leave a reply!