5

I have c# program we use to replace some Values with others, to be used after as parameters. Like 'NAME1' replaced with &1, 'NAME2' with &2, and so on.

The problem is that the data to modify is on a text file encoded on UNIX, and special characters like í, which even on memory, gets read as a square(Invalid char). Due specifications that are out of my control, the file can't be changed and have no other choice than read it like that.

I have tryed to read with most of the 130 Encodings c# offers me with:

EncodingInfo[] info = System.Text.Encoding.GetEncodings();
string text;
for (int a = 0; a < info.Length; ++a)
{
      text = File.ReadAllText(fn, info[a].GetEncoding());
      File.WriteAllText(fn + a, text, info[a].GetEncoding());
}

fn is the file path to read. Have checked all the made files(like 130), no one of them writes properly the í so im out of ideas and im unable to find anything on internet.

SOLUTION:

Looks like finally this code made the work to get the text properly, also, had to fix the same encoder for the Writing part:

System.Text.Encoding encoding = System.Text.Encoding.GetEncodings()[41].GetEncoding();

String text = File.ReadAllText(fn, encoding); // get file text 

// DO ALL THE STUFF I HAD TO

File.WriteAllText(fn, text, encoding) System.Text.Encoding.GetEncodings()[115].GetEncoding();   //Latin 9 (ISO) 

/* ALL THIS ENCODINGS WORKED APARENTLY FOR ME WITH ALL WEIRD CHARS I WAS ABLE TO WRITE :P
    System.Text.Encoding.GetEncodings()[108].GetEncoding(); //Baltic (ISO)
    System.Text.Encoding.GetEncodings()[107].GetEncoding(); //Latin 3 (ISO)
    System.Text.Encoding.GetEncodings()[106].GetEncoding(); //Central European (ISO)
    System.Text.Encoding.GetEncodings()[105].GetEncoding(); //Western European (ISO)
    System.Text.Encoding.GetEncodings()[49].GetEncoding();      //Vietnamese (Windows)
    System.Text.Encoding.GetEncodings()[45].GetEncoding();      //Turkish (Windows)
    System.Text.Encoding.GetEncodings()[41].GetEncoding();      //Central European (Windows)   <-- Used this one 
    */

Thank you very much for your help

Noman(1)

Noman_1
  • 473
  • 1
  • 6
  • 17
  • 2
    What encoding was the file written in? Without knowing that, all you have to go on is guessing. That it is on a UNIX machine is irrelevant. – Oded May 08 '12 at 14:52
  • 1
    +1 for automated guessing!, but now you have to go back to your source to find out, as Oded says, 'what encoding was the file written it?'. Good luck! – shellter May 08 '12 at 14:54
  • Im sorry to tell that i can't know the source, the only i know is that on notepad is marked on bottom as UNIX ANSI, it's created from a bat which does copy [somefiles with *] myFile.txt. I asume most of them got created from "Save" function from Oracle or from an Excel script – Noman_1 May 08 '12 at 15:42
  • Opps i mean on Notepad++ – Noman_1 May 08 '12 at 15:48
  • Found the solution but can't post it until 6 more hours have passed =) – Noman_1 May 08 '12 at 16:24

1 Answers1

2

you have to get the proper encoding format. try

use file -i. That will output MIME-type information for the file, which will also include the character-set encoding. I found a man-page for it, too :)

Or try enca

It can guess and even convert between encodings. Just look at the man page.

If you have the proper encoding format, look for a way to apply it to your file reading.

Quotes: How to find encoding of a file in Unix via script(s)

Community
  • 1
  • 1
sschrass
  • 7,014
  • 6
  • 43
  • 62