9

My HTML pages use <meta charset="windows-1252">.

  1. Is changing to UTF-8 recommended and why?
  2. I checked some of my pages with UTF-8 and got question marks with some math symbols. E.G: x should be changed to × in order to show correctly. I tried the CpConverter but it did not convert well all the symbols.

Is there a better way to convert many files?

Ryan Vincent
  • 4,483
  • 7
  • 22
  • 31
Joe
  • 377
  • 3
  • 5
  • 13

3 Answers3

20
  1. UTF-8 is generally approved standard, which works everywhere. Windows-any encoding is Windows-specific and not guaranteed to work on any machine. Also, take a look here and here
  2. If you want to change the encoding of a file, you can do it in many ways. You can look for encoding type in your text editor/IDE or use the following command (not tested, it should work though):

iconv -f WINDOWS-1252 -t UTF-8 filename.txt

Mateusz
  • 3,038
  • 4
  • 27
  • 41
0

The answer to your first question is yes. It is recommended that you should absolutely change all your character encoding Attributes for all your HTML 5 documents.

This is because it is the current HTML5 Standard according to W3C. I would change all of the pages in any given site based on this reason alone as a standardization of all markup rendering is inevitable.

This can easily be done on any editor that has a find/replace feature. Simply use the feature to find in every document the term

<meta charset="windows-1252">

and replace it with

<meta charset="utf-8"/>

The UTF-8 character encoding should be able to handle your math characters but if it doesn't simply leave your original charset as is. And the rest of your pages with text only you will want to change to UTF-8. Here is W3Schools position on your char encoding.

The HTML5 specification encourages web developers to use the UTF-8 character set, which covers almost all of the characters and symbols in the world! --W3Schools.com

If size is an issue, again you will only leave those documents that have the special math character requirements with the original encoding if they don't render correctly and I don't think it will effect your browser load time enough to damage your SEO. If you have many pages with math symbols then this could be a problem if your looking for a popular site or for business, if not the size is so small the problem with file size seems mute.

For the other documents you should still change the encoding for them as UTF-8 even if you have a BOM.

If you have a UTF-8 byte-order mark (BOM) at the start of your file then recent browser versions other than Internet Explorer 10 or 11 will use that to determine that the encoding of your page is UTF-8. It has a higher precedence than any other declaration, including the HTTP header.

You could skip the meta encoding declaration if you have a BOM, but we recommend that you keep it, since it helps people looking at the source code to ascertain what the encoding of the page is. --w3.org

Good luck and happy coding! :-)

Adam R.
  • 1
  • 2
-4

It's an old question but my answer may help someone to decide better.

Changing from ANSI(windows-1252) to UTF-8 approximately doubles the size of HTML files. (Depending on characters used in the file)

If you want to test this, just create a file in notepad with the following characters:

الف

These characters are both in ANSI(Windows-1256) and Unicode. Save the file once with ANSI(Windows-1256) encoding and once again with UTF-8 encoding.

Size of the UTF-8 file: 9 bytes

Size of the ANSI(Windows-1256) file: 3 bytes

if you want to change the charset of your page, simply open them in notepad or any other editor and save as with UTF-8 encoding.

Hossein
  • 1,640
  • 2
  • 26
  • 41
  • 6
    Using UTF-8 does not increase the size of your HTML file if you're using the standard Alphanumeric characters. Those characters you wrote cannot be represented in ASCII. When you save it as ASCII, it just converts it to "???" When you save the UTF-8 file in Notepad, 3 of those 9 bytes are the BOM: the byte sequence 0xEF, 0xBB, 0xBF. (Only 6 bytes are used to represent the characters you typed.) – Matthew Nakayama May 12 '18 at 06:24
  • Those characters will be saved with ASCII just fine. They are standard Persian/Arabic chars. but using them with UTF-8 file will double the size. – Hossein May 12 '18 at 19:42
  • 1
    The characters `الف` do not exist in ASCII. It is not possible to encode them in ASCII. –  Jan 27 '19 at 02:06
  • Just try what I've said before down voting. – Hossein Jan 27 '19 at 07:22
  • 2
    As @Isaac said, those characters aren't ASCII, but you seem to be muddling ASCII with Windows-1256, which does allow Arabic characters. – Rich S Apr 11 '19 at 14:54
  • Yes! Those are ANSI(Windows-1256) not ASCII. – Hossein Apr 12 '19 at 20:41