1

I'm working on a tool for validating XML files grabbed from a mainframe. For reasons beyond my control every XML file is encoded in ISO 8859-1.

<?xml version="1.0" encoding="ISO 8859-1"?>

My C# application utilizes the System.XML library to parse the XML and eventually a string of a message contained within one of the child nodes.

If I manually remove the XML encoding line it works just fine. But i'd like to find a solution that doesn't require manual intervention. Are there any elegant approaches to solving this? Thanks in advance.

The exception that is thrown reads as:

System.Xml.XmlException' occurred in System.Xml.dll. System does not support 'ISO 8859-1' encoding. Line 1, position 31

My code is

XMLDocument xmlDoc = new XMLDocument();
xmlDoc.Load(//fileLocation);
Filburt
  • 17,626
  • 12
  • 64
  • 115
Reed
  • 1,515
  • 1
  • 21
  • 38
  • 1
    Show your code pls. – Daniel Stackenland May 02 '17 at 14:46
  • 1
    What's the actual code? What's the actual message of the error? What's the actual document? `XDocument.Parse("")` works fine. (That's `System.Xml.Linq` of course, not plain old `System.Xml`.) If all else fails, obviously, you could just do a `String.Replace` to strip out the directive before parsing the result as XML. – Jeroen Mostert May 02 '17 at 14:47
  • The error it throws is: System.Xml.XmlException' occurred in System.Xml.dll. System does not support 'ISO 8859-1' encoding. Line 1, position 31. Put simply, the code follows as: `XMLDocument xmlDoc = new XMLDocument(); xmlDoc.Load(//fileLocation)`. And that's the line that it fails. – Reed May 02 '17 at 14:50
  • Sorry I edited it into my reply, I didn't know that pressing enter submitted my comment. I'm new to this site - apologies. – Reed May 02 '17 at 14:53
  • Hold shift if you need a new line :) – Alex May 02 '17 at 14:54
  • 2
    The problem is that `ISO 8859-1` is not a recognized built-in encoding name. `ISO-8859-1` is. Silly but true. Prior to .NET 4.6 (which allows you to register additional encoding providers) I don't know if there's a way to add encoding aliases, but I doubt it. Based on that there's a host of possible workarounds, though (the easiest being adding the hyphen...), like first reading the file manually and then using `.LoadXml` (which ignores the encoding in the directive, as the string is already necessarily UTF-16 internally). If the file is too large for that, you'll have to get more subtle. – Jeroen Mostert May 02 '17 at 15:10
  • The solution described in [this answer to _How to prevent System.Xml.XmlException: Invalid character in the given encoding_](http://stackoverflow.com/a/8275868/1336654) works. – Jeppe Stig Nielsen May 02 '17 at 15:14

1 Answers1

4

As Jeroen pointed out in a comment, the encoding should be:

<?xml version="1.0" encoding="ISO-8859-1"?>

not:

<?xml version="1.0" encoding="ISO 8859-1"?>

(missing dash -).

You can use a StreamReader with an explicit encoding to read the file anyway:

using (var reader = new StreamReader("//fileLocation", Encoding.GetEncoding("ISO-8859-1")))
{
  var xmlDoc = new XmlDocument();
  xmlDoc.Load(reader);
  // ...
}

(from answer by competent_tech in other thread I linked in an earlier comment).

If you do not want the using statement, I guess you can do:

var xmlDoc = new XmlDocument();
xmlDoc.LoadXml(File.ReadAllText("//fileLocation", Encoding.GetEncoding("ISO-8859-1")));

Instead of XmlDocument, you can use the XDocument class in the namespace System.Xml.Linq if you refer the assembly System.Xml.Linq.dll (since .NET 3.5). It has static methods like Load(Stream) and Parse(string) which you can use as above.

Jeppe Stig Nielsen
  • 60,409
  • 11
  • 110
  • 181
  • I ended up using `xmlDoc.LoadXml(File.ReadAllText("//fileLocation", Encoding.GetEncoding("ISO-8859-1")));` and it works great. Thanks! – Reed May 02 '17 at 18:03