0

I have created a web site which is valid to strict XHTML and passes the validation, but the W3C validator tells me I have a note (error):

Byte-Order Mark found in UTF-8 File.

The Unicode Byte-Order Mark (BOM) in UTF-8 encoded files is known to cause problems for some text editors and older browsers. You may want to consider avoiding its use until it is better supported.

But I have no BOM in my file. It's straight XHTML done in VS.

Is the server adding it? How can I get rid of the error?

This is important as it screws up semantic extraction. http://www.w3.org/2003/12/semantic-extractor.html

Community
  • 1
  • 1
Karl Wilson
  • 33
  • 1
  • 8

2 Answers2

2

You do have a BOM (EF BB BF) in your resource. Consider removing it, perhaps, using some hex editor. How do I remove the BOM character from my xml file

Community
  • 1
  • 1
Alexander Pavlov
  • 31,598
  • 5
  • 67
  • 93
  • where, how did you see this, i can not. cool great. i am not sure how to remove it as i can not see it. Ive tried view source. – Karl Wilson Jun 19 '12 at 15:02
  • 1
    BOM is not intended for display and thus is not displayed by ordinary text viewers/editors. Use a hex viewer for that. – Alexander Pavlov Jun 19 '12 at 15:02
  • Please what is a good vieweer to use in windows, and where did the charters come from, is it visual studio or is it the web server.. – Karl Wilson Jun 19 '12 at 15:20
  • i have found a good viewer editor. thanks so much for your help. any idea how they go there. – Karl Wilson Jun 19 '12 at 15:28
  • 1
    They are written by the text editor and serve for the purpose of explicitly telling any software reading the file about its encoding (UTF-8 in your case.) – Alexander Pavlov Jun 19 '12 at 15:37
  • Its mad, semantic extraction is key to good coding, I take a lot of care sculpting my structure. I need search engines to understand me more that anything. Thank you so much for your help. I hope you have a good day as I see you answer a lot of question for people and people like you are off a great help. – Karl Wilson Jun 19 '12 at 15:43
  • You are welcome. Alas, any software that understands UTF-8 should also parse the BOM properly, and if they don't, you should file a bug :) – Alexander Pavlov Jun 19 '12 at 15:46
  • The problem with the BOM in HTML usually is that it appears before your DOCTYPE, thus adding some characters at the top left of your page, and possibly messing up your encoding, and triggering Quirks mode in old IE. To remove it, you can use [Notepad++](http://notepad-plus-plus.org/); then, from the menubar, *Encoding->Convert to UTF-8 (without BOM)*. – avramov Jun 24 '12 at 07:05
1

The W3C Markup Validator does not indicate a BOM in UTF-8 as an error; it would itself be in error if it did, since a BOM is allowed at the start of UTF-8 data. It issues a warning.

The warning is seriously outdated. No problems have been observed in relevant browsers for many years. On the contrary, BOM should be regarded as useful, since if e.g. a file is saved locally (and HTTP headers are thus lost, the BOM in UTF-8 format lets browsers to infer, with practical certainty, that the document is UTF-8 encoded.

The Semantic data extraction tool is not very up-to-date, and it suffers from a too theoretic approach, but it does not seem to have any problem with BOM at the start of UTF-8 data.

It is possible that the server adds the BOM, or that your authoring tool adds it. Either way, it should be considered as useful, rather than a problem.

Jukka K. Korpela
  • 195,524
  • 37
  • 270
  • 390