4

I have a rather curious problem, using the XElement load method to load in a html document (which is well formed checked with HTML Tidy), which work absolutely perfectly for English documents, however moving to French and Spanish docs I'm presented with an XML Exception;

XML Exception
Invalid character in the given encoding. Line 23, position 43.

The method call

XElement doc = XElement.Load("example1.html", LoadOptions.None);

Sniplet of the html document

<font face="Arial" size="3" color="#ffffff">
Le test <b> exemple français, qui devrait éventuellement être suivie d'un texte en langue espagnole. </ b>
</font>

I realise my HTML does not have an encoding type set at the start of the file, is there a way around this?

skaffman
  • 398,947
  • 96
  • 818
  • 769
wonea
  • 4,783
  • 17
  • 86
  • 139

1 Answers1

3

because you're not using XDocument you can't set character encoding, use that instead and set encoding = UTF-8

http://msdn.microsoft.com/en-us/library/bb387063.aspx

James Walford
  • 2,953
  • 1
  • 24
  • 37
  • 1
    Thanks for pointing me in the right direction, eventually after a digging around I found this as well; http://stackoverflow.com/questions/310669/why-does-c-xmldocument-loadxmlstring-fail-when-an-xml-header-is-included – wonea Jan 06 '11 at 10:34