0

I recently ran an HTML file I was writing through this on-line HTML validator, and one of the diagnostics I got said,

The character encoding was not declared. Proceeding using "windows-1252".

When I create a webpage, I write it in a text editor, which saves it as DOS-text (with CR-LF line endings). When I upload the file to my web-hosting provider, it gets converted (I think) on the server to Unix text (LF line endings). My text editor can also save files as Unicode including UTF-8, but I rarely find that necessary.

The standard online advice about specifying the character encoding in a web document is to include, just under the <head> tag, <meta charset="utf-8">. There is also advice that you should ensure that what you specify does not conflict with the information sent by the server in the HTTP headers when serving the document. Using Rex Swain's [online] HTTP viewer, I see that in the HTTP headers it just says,

Content-Type:·text/html

Should I follow the standard advice to specify the charset as UTF-8, even though the html file is never saved as such, or should I specify it as windows-1252, as assumed by that online validator, or as ISO-8859-1 as per one of the example values on W3Schools? Also, some examples of the charset metatag show it terminated as />. Which is the preferred syntax, and should there be a space before the slash?

Moongazer
  • 33
  • 5
  • Does [this](https://stackoverflow.com/questions/14669352/is-the-charset-meta-tag-required-with-html5) answer your question? (The /> one is answered [here](https://stackoverflow.com/questions/1946426/html-5-is-it-br-br-or-br).) – BoltClock Jul 26 '17 at 04:12
  • If your files are plain ASCII (no € symbols or letters like æ or ï), then both UTF-8 and ISO-8859-1 will do. Otherwise you should pick something. I'd suggest UTF-8. Syntax `` is valid in XML/XHTML but invalid in HTML/SGML. (But HTML -parsers are rather liberal, they won't complain much.) – Zsigmond Lőrinczy Jul 26 '17 at 04:31
  • When HTML is sent over HTTP, the default `charset` for the `Content-Type` HTTP header is `ISO-8859-1` if not specified otherwise. The actual HTML can override this with an appropriate `` tag (`` for HTML 4, `` for HTML 5). Just make sure the specified charset matches the actual transmitted encoding of the HTML. If the HTML is purely 7bit ASCII, specifying just about any charset will work, but prefer UTF-8 since it is a superset of ASCII. – Remy Lebeau Jul 26 '17 at 23:54

0 Answers0