0

I have an application which is using to create XML documents on the example of existing. But that's not the point. Today I noticed that there is an error if the opened file encoding is ANSI. Before that I worked with files UTF-8 and this problem does not arise. What should you do and how?

Fragments of code:

string filepath;
XmlDocument xdoc = new XmlDocument();
XmlElement root;
...............
if (openFileDialog1.ShowDialog() == DialogResult.OK)
{
    filepath = openFileDialog1.FileName;
    textBox1.Text = filepath;
    load();
}
...............
public void load()
{
    xdoc.Load(filepath);
    root = xdoc.DocumentElement;
...............

Error:

An unhandled exception of type 'System.Xml.XmlException' occurred in System.Xml.dll Additional information: An invalid character for the specified encoding., Line 35, position 16.

In that line is Cyrillic symbols (russian language). But if I converted this document to UTF-8 by NotePad++ - it loaded correctly.

pad0n
  • 187
  • 3
  • 17

1 Answers1

0

You could use a StreamReader to read the file with the correct encoding and then load that stream into the XmlDocument overload that accepts a stream.

using(var sr = new StreamReader(filepath, myEncoding))
{
   xdoc.Load(sr);
}

You can obtain myEncoding via the GetEncoding method.

keyboardP
  • 68,824
  • 13
  • 156
  • 205
  • Not really helped. It is now not an error, but the data did not upload correctly. Here's screenshot: http://petromi.com/get/90f1577ad1.jpeg I did it with: Encoding utf8 = new UTF8Encoding(false); – pad0n Aug 14 '13 at 12:23
  • You shouldn't be using `UTF8` encoding. Instead, you need to use the relevant codepage. I noticed you've added that it's Cyrillic symbols so try `Encoding myEncoding= Encoding.GetEncoding(1251);` – keyboardP Aug 14 '13 at 12:26
  • Yeap, this is cyrillic symbols. Now I've tried your second method and this document (with ANSI encoding) uploaded correctly, but that documents which was UTF-8 now loaded wrong (http://petromi.com/get/2893929137.jpeg). What should I do to recognize FILE encoding to decide what load file method to use? Or Is it possible there is an easier way to get all the files to appear correctly? – pad0n Aug 14 '13 at 12:38
  • Accurately recognizing file encoding is a tricky but this thread has various answers http://stackoverflow.com/questions/90838/how-can-i-detect-the-encoding-codepage-of-a-text-file. I'm not sure of a way that will always work (although someone else might know). This is a hack but if you know that there will only be two encodings, you could try loading one encoding within a Try...Catch block and if an exception is thrown, try loading it again with the other encoding. Of course, if there could be a number of encodings this is a bad approach. – keyboardP Aug 14 '13 at 12:43
  • I was thinking about try catch but then I found there is at least 3 encodings can be as I know =(( But Thank you for helping me :) – pad0n Aug 14 '13 at 12:52
  • No problem :) I think the best best would be to attempt to get the file type via methods shown in that linked thread or provide the user with the option to choose. – keyboardP Aug 14 '13 at 12:53