3

My problem is that I want to parse a file and it had to detect a special char ('Â') to do some stuff. I didn't manage to detect it but it worked with normal chars like 'a'. So I tried to understand where the problem was coming from, and created a .txt file whose only char is 'Â'.

string a = File.ReadAllText("C:/example/example/test.txt");
Console.WriteLine(a.Length);`

Console prints 0. It's like the char doesn't exist. So I tried with different encodings (File.ReadAllText with utf8, utf16, unicode and so) and had the same result.

I really don't know what to do, thanks in advance!

Sifflex
  • 51
  • 2
  • 7
  • 2
    There's nothing wrong with your code. Are you sure you're reading the correct file? – DavidG Jul 04 '18 at 17:01
  • I would try some `encoding` with this overload [File.ReadallText](https://msdn.microsoft.com/en-us/library/ms143369.aspx) – Felipe Oriani Jul 04 '18 at 17:02
  • If you made sure that you are indeed reading the correct text file and the problem persists, then first check the file size of this file. If the file size is not zero, then take a hex viewer/hex editor and look at the bytes inside the text file. What do you see? –  Jul 04 '18 at 17:10
  • I just tried to use ReadALlText() with 'Â' chanracter.and its perfectly fine to use .can you post your file – Debashish Saha Jul 04 '18 at 17:15
  • can you hit [System.Text.Encoding]::Default command into your system power shell – Debashish Saha Jul 04 '18 at 17:20
  • 1
    Please show the binary content of the file - if you use `byte[] data = File.ReadAllBytes(...);` and then `Console.WriteLine(BitConverter.ToString(data));` what does that show? – Jon Skeet Jul 04 '18 at 17:48
  • 2
    The 'Â' character is fairly special, you'll often see it back in a utf-8 encoded text file. Along with other accented A characters, 0xC0 and up are common bytes in such a file when it encodes text in a Latin alphabet. It won't be 'Â' anymore after StreamReader has applied its Encoding. A return value of 0 is expected, such characters require more than 1 byte to encode. So first thing you want to do is make sure that you are not looking for that character for the wrong reason. – Hans Passant Jul 04 '18 at 18:05

3 Answers3

1

I changed the project target framework and this error message appeared. Try to add "System.IO" before File.ReadAllLines

e.g. Instead of:

File.ReadAllLines(text);

Use:

System.IO.File.ReadAllLines(text);

It happened when you are using two different library with the same function. In this case you need to define the proper library as well.

moh99
  • 11
  • 1
-1

It works, if you set the encoding as Default:

string result = File.ReadAllText("test.txt", Encoding.Default);

This will give you the "Â".

Simão Ferreira
  • 177
  • 2
  • 5
  • 1
    this does not make sure you will get that character.its only applying system's default encoding setting to the file that stream is going to read. – Debashish Saha Jul 04 '18 at 17:22
  • We have no information to suggest that the OP's file is actually using the system default encoding. – Jon Skeet Jul 04 '18 at 17:47
  • 1
    Encoding.Default should only be used in very rare cases and generally be avoided. It may seem to work, but in the moment when you decide to export this file to another system or import a file from another system you basically have broken code as now everything is left up to chance. – ckuri Jul 04 '18 at 18:10
  • Exactly,default might be different for each user setting. – Debashish Saha Jul 04 '18 at 18:49
-1

You are trying to read Latin character(s) which is 8859-1 encoding. Try below

 Encoding iso = Encoding.GetEncoding("ISO-8859-1");
 string a = File.ReadAllText("C:/example/example/test.txt",iso);
 Console.WriteLine(a.Length);
Prany
  • 2,078
  • 2
  • 13
  • 31
  • 1
    "You are trying to read Latin character(s) which is 8859-1 encoding" - the same characters can be represented in many encodings. You can't tell just from the character which encoding is being used. – Jon Skeet Jul 04 '18 at 17:46