Read UNIX encoded file with C#

Question

I have c# program we use to replace some Values with others, to be used after as parameters. Like 'NAME1' replaced with &1, 'NAME2' with &2, and so on.

The problem is that the data to modify is on a text file encoded on UNIX, and special characters like í, which even on memory, gets read as a square(Invalid char). Due specifications that are out of my control, the file can't be changed and have no other choice than read it like that.

I have tryed to read with most of the 130 Encodings c# offers me with:

EncodingInfo[] info = System.Text.Encoding.GetEncodings();
string text;
for (int a = 0; a < info.Length; ++a)
{
      text = File.ReadAllText(fn, info[a].GetEncoding());
      File.WriteAllText(fn + a, text, info[a].GetEncoding());
}

fn is the file path to read. Have checked all the made files(like 130), no one of them writes properly the í so im out of ideas and im unable to find anything on internet.

SOLUTION:

Looks like finally this code made the work to get the text properly, also, had to fix the same encoder for the Writing part:

System.Text.Encoding encoding = System.Text.Encoding.GetEncodings()[41].GetEncoding();

String text = File.ReadAllText(fn, encoding); // get file text 

// DO ALL THE STUFF I HAD TO

File.WriteAllText(fn, text, encoding) System.Text.Encoding.GetEncodings()[115].GetEncoding();   //Latin 9 (ISO) 

/* ALL THIS ENCODINGS WORKED APARENTLY FOR ME WITH ALL WEIRD CHARS I WAS ABLE TO WRITE :P
    System.Text.Encoding.GetEncodings()[108].GetEncoding(); //Baltic (ISO)
    System.Text.Encoding.GetEncodings()[107].GetEncoding(); //Latin 3 (ISO)
    System.Text.Encoding.GetEncodings()[106].GetEncoding(); //Central European (ISO)
    System.Text.Encoding.GetEncodings()[105].GetEncoding(); //Western European (ISO)
    System.Text.Encoding.GetEncodings()[49].GetEncoding();      //Vietnamese (Windows)
    System.Text.Encoding.GetEncodings()[45].GetEncoding();      //Turkish (Windows)
    System.Text.Encoding.GetEncodings()[41].GetEncoding();      //Central European (Windows)   <-- Used this one 
    */

Thank you very much for your help

Noman(1)

What encoding was the file written in? Without knowing that, all you have to go on is guessing. That it is on a UNIX machine is irrelevant. — Oded, May 08 '12 at 14:52
+1 for automated guessing!, but now you have to go back to your source to find out, as Oded says, 'what encoding was the file written it?'. Good luck! — shellter, May 08 '12 at 14:54
Im sorry to tell that i can't know the source, the only i know is that on notepad is marked on bottom as UNIX ANSI, it's created from a bat which does copy [somefiles with *] myFile.txt. I asume most of them got created from "Save" function from Oracle or from an Excel script — Noman_1, May 08 '12 at 15:42
Found the solution but can't post it until 6 more hours have passed =) — Noman_1, May 08 '12 at 16:24

score 2 · Answer 1 · edited May 23 '17 at 12:32

2

you have to get the proper encoding format. try

use file -i. That will output MIME-type information for the file, which will also include the character-set encoding. I found a man-page for it, too :)

Or try enca

It can guess and even convert between encodings. Just look at the man page.

If you have the proper encoding format, look for a way to apply it to your file reading.

Quotes: How to find encoding of a file in Unix via script(s)

edited May 23 '17 at 12:32

Community

1
1

answered May 08 '12 at 14:57

sschrass

7,014
6
43
62

Found out the solution, but can't post it until 6 more hours have passed =) – Noman_1 May 08 '12 at 16:24

Read UNIX encoded file with C#

1 Answers1