0

I am trying to read all text from textfile. Its working fine for English language and fails for Spanish, French, etc. I have to read any language from textfile. I am using File.ReadAlltext(filepath,Encoding.UTF8). I tried UTF-8,Default,etc. But it fails to read, am getting some unwanted characters. Please give me a solution to resolve this issue.

Sathish
  • 159
  • 1
  • 4
  • 18

2 Answers2

1

Do you know what encoding you file uses? If not then you can try out the solution mentioned here. You can only hope for best when trying to find out encoding programmatically because the result can always bring surprises as there are many possibilities. Below is the code i picked up from that link.

/// <summary>
/// Determines a text file's encoding by analyzing its byte order mark (BOM).
/// Defaults to ASCII when detection of the text file's endianness fails.
/// </summary>
/// <param name="filename">The text file to analyze.</param>
/// <returns>The detected encoding.</returns>
public static Encoding GetEncoding(string filename)
{
    // Read the BOM
    var bom = new byte[4];
    using (var file = new FileStream(filename, FileMode.Open)) file.Read(bom, 0, 4);

    // Analyze the BOM
    if (bom[0] == 0x2b && bom[1] == 0x2f && bom[2] == 0x76) return Encoding.UTF7;
    if (bom[0] == 0xef && bom[1] == 0xbb && bom[2] == 0xbf) return Encoding.UTF8;
    if (bom[0] == 0xff && bom[1] == 0xfe) return Encoding.Unicode; //UTF-16LE
    if (bom[0] == 0xfe && bom[1] == 0xff) return Encoding.BigEndianUnicode; //UTF-16BE
    if (bom[0] == 0 && bom[1] == 0 && bom[2] == 0xfe && bom[3] == 0xff) return Encoding.UTF32;
    return Encoding.ASCII;
}
Community
  • 1
  • 1
Sameer
  • 3,124
  • 5
  • 30
  • 57
0

You can get the file encoding for example using this https://code.google.com/p/chardetsharp/ library. And then convert to the desired.

bobah75
  • 3,500
  • 1
  • 16
  • 25