Unless you know for a fact that you will always be using the same software and computer system to edit your HTML, you will inevitably run into situations where you cannot edit your own code if you directly use symbols, regardless of what character encoding you specify in your document or with your HTTP headers. Only in a perfect world does the character encoding always properly transfer, and even then neither Macintosh nor Windows truly does it correctly.
If I open up a supposedly "properly" encoded document from either Macintosh or Windows in software that truly supports all available encoding systems, I see a message like this:
-=-J(DOS)**--F1 Top L3 (Text) ----------------------------------------
These default coding systems were tried to encode text
in the buffer:
(iso-2022-7bit-dos (284 . 4194194) (379 . 4194194) (462 . 4194195)
(492 . 4194196) (635 . 4194195) (640 . 4194196) (642 . 4194195) (772
. 4194196) (833 . 4194195) (839 . 4194196) (857 . 4194195))
(utf-8-dos (284 . 4194194) (379 . 4194194) (462 . 4194195) (492
. 4194196) (635 . 4194195) (640 . 4194196) (642 . 4194195) (772
. 4194196) (833 . 4194195) (839 . 4194196) (857 . 4194195))
However, each of them encountered characters it couldn't encode:
iso-2022-7bit-dos cannot encode these: \222 \222 \223 \224 \223 \224 \223 \224 \223 \224 ...
utf-8-dos cannot encode these: \222 \222 \223 \224 \223 \224 \223 \224 \223 \224 ...
Click on a character (or switch to this window by `C-x o'
and select the characters by RET) to jump to the place it appears,
where `C-u C-x =' will give information about it.
Select one of the safe coding systems listed below,
or cancel the writing with C-g and edit the buffer
to remove or modify the problematic characters,
or specify any other coding system (and risk losing
the problematic characters).
thai-tis620
Remember that as soon as the data is off of your server, e.g., placed in an email, etc., there is no guarantee the encoding is passed along, and chances are that it is not. Byte marks and other invisible means of identifying documents do not work as promised, let alone transient methods such as HTTP headers which are lost as soon as the document moves beyond the context of your own carefully configured HTTP server.
The guiding principle of HTML is that it is a plain text markup language that, when properly used, is universally compatible with any system supporting the most basic of text. HTML documents should use HTML entities for any characters outside of the normal 7-bit US-ASCII character set. Any other characters have different binary definitions depending on the encoding used and may even vary between single-byte and multi-byte representations.
Within Non-HTML documents you can feel free to use raw symbols because when you embed them within either their native file format or within HTML you can ensure that you specify the "right" character encoding, i.e., the one that will be recognized by the system where you authored it and any system compatible with that.
and not break a line – Lefsler May 16 '13 at 18:20