17

I spent the last few hours getting my website to validate HTML 4.01 Strict and I actually have succeeded in that, but there is still one warning which I can't get rid of. The warning is:

Character Encoding mismatch!

The character encoding specified in the HTTP header (iso-8859-1) is different from the value in the element (utf-8). I will use the value from the HTTP header (iso-8859-1) for this validation.

The page in question is www.dubiousarray.net/default.html. As you can see from the page source I have the following meta element:

<meta http-equiv="Content-Type" content="text/html;charset=utf-8">

and I have made sure that the default.html file is saved with UTF-8 encoding. The strange thing is all the other pages in the site validate without this warning and they have the same meta tag and were saved in exactly the same way. I am pretty sure it is something to do with the server configuration. The .htaccess file looks like this at the moment:

# Use PHP 5 as default
AddHandler application/x-httpd-php5 .php
AddDefaultCharset UTF-8

But I have tried all the fixes shown on this page and none of them worked. How can I go about getting rid of this warning?

In Firefox, if you right click on the page and select 'View Page Info', default.html shows as ISO-8859-1, while all the other pages show UTF-8.

All the html file have been created and saved in the exact same way (character encoding set to UTF-8 without BOM), but default.html is the only one which isn't displaying as UTF-8. So I assume the server is doing something special to the default.html file though I am not sure what as there is not sign of it in the .htaccess file.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Jacob de Lacey
  • 278
  • 1
  • 4
  • 12

5 Answers5

26

You need to replace the HTTP-level header.

This should work:

<?php
    header('Content-type: text/html; charset=utf-8');
?>

Note that the above must be the first thing in your file. No exceptions. See header.

For general information on how to change the character set header in different web stacks, see Setting the HTTP charset parameter.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Larry K
  • 47,808
  • 15
  • 87
  • 140
  • That looks like it should fix the problem as it comes after the server messes with it but as I no longer have the default.html file that was causing the problem I can't test it. I'm going to assume it would have worked though :-). – Jacob de Lacey May 25 '09 at 14:18
  • But this doesn't work for a static HTML page (even if the question mentions some unrelated PHP setting in a configuration file). – Peter Mortensen Aug 15 '21 at 12:58
2

Include this in your code:

<meta charset="utf-8" />
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
2

The server is clearly marking the document as ISO-8859-1 in the HTTP headers. Try saving default.html using UTF-8 encoding with a proper editor.

hannson
  • 4,465
  • 8
  • 38
  • 46
  • 1
    The default.html file has been saved as UTF-8 with a proper editor (UTF-8 without BOM to be exact). And I am almost certain that the encoding is correct when I save it as all the other pages on the site are shown as UTF-8 when inspected by Firefox. The server seems to be doing something special to the default.html file. – Jacob de Lacey May 25 '09 at 03:19
1

Okay, I have come up with a partial solution to my problem. As it was only the default.html file which was causing the warning I assumed that the server was doing something special to it because of its name. So I made a new file called home.html with the same contents as the default.html file and pointed the .htaccess file to the new file (see line 3 below).

# Use PHP5 as default
AddHandler application/x-httpd-php5 .php
DirectoryIndex home.html
AddDefaultCharset UTF-8

This fixed the problem and all files are now recognised as UTF-8. I'm still not sure what the server was doing to the default.html file or where the settings concerning that are, but as my problem is gone I will forget about that.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Jacob de Lacey
  • 278
  • 1
  • 4
  • 12
0

Remove AddDefaultCharset from .htaccess and check the encoding.

I saved the HTML source of your webpage and opened it. Encoding was detected as UTF-8. However, on viewing the same webpage served by your web server, the encoding is ISO-8859-1. That is why I suggest removal of the former redundant rule.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Alan Haggai Alavi
  • 72,802
  • 19
  • 102
  • 127
  • I removed that rule from the .htaccess file as you are right, it does seem redundant, it didn't do anything to fix the problem though. – Jacob de Lacey May 25 '09 at 03:16