0

I need to parse a csv file I got, I realized when parsing it all the characters were corrupted After some investigating it seems the file is encoded with cp1255, I would rather avoid having to create my own encoder, Is there a different way to read the file with c# or convert it to utf8?

Edit:

private static Encoding encoding = Encoding.UTF8;
....

  var textReader = new StreamReader(reportCsv, encoding);
  var csv = new CsvReader(textReader, new Configuration { BadDataFound = null, Delimiter = delimiter, Encoding = encoding });

I have tried all the encoding c# found me and nothing.. after using tools to detect what encoding was used in that file I found it was encoded in cp1255.. And I don't think I have a decoder/encoder for that.

I'm using CsvHelper lib to read the CSV file But I believe the problem starts with the StreamReader.

Doctor Strange
  • 189
  • 1
  • 3
  • 12
  • Show us how you're reading the file, then you might get a useful, targeted answer. – spender Mar 19 '18 at 20:23
  • Relevant: [Convert a string's character encoding from windows-1252 to utf-8](//stackoverflow.com/q/5568033) – 001 Mar 19 '18 at 20:26
  • You can read an encoded text file with [`File.ReadAllText(path, encoding)`](https://msdn.microsoft.com/en-us/library/ms143369(v=vs.110).aspx), so you might choose to read it with `File.ReadAllText(path,Encoding.GetEncoding(1255))` – spender Mar 19 '18 at 20:26
  • Possible duplicate of [How to read text files with ANSI encoding and non-English letters?](https://stackoverflow.com/questions/12130290/how-to-read-text-files-with-ansi-encoding-and-non-english-letters) – Robert McKee Mar 19 '18 at 20:27
  • In light of your edit... Why not use `Encoding.GetEncoding(1255)` instead of `Encoding.UTF8`? – spender Mar 19 '18 at 20:28
  • I'm getting an exception NotSupportedException: No data is available for encoding 1255. For information on defining a custom encoding, see the documentation for the Encoding.RegisterProvider method. I thought it doesn't exist, Am I missing something? – Doctor Strange Mar 19 '18 at 20:35

1 Answers1

1

Due to you did not paste any code it is hard to help...

But every File function in c# has got an encoding parameter where you can specify the encoding of the file you are reading eg.:

File.ReadAllLines(String, Encoding)

An Encoding can be created using the Encoding class, eg. for cp850:

Encoding encoding = Encoding.GetEncoding(850);

You can read more about encodings (nice article) here.

Additionally wondering why you want to convert it to utf-8? c# is native utf-16.

EDIT

Due to the project is based on .net core additional codepage registration according to this so post was necessary.

dsdel
  • 1,042
  • 1
  • 8
  • 11
  • If I am trying to do Encoding.GetEncoding(1255) I'm getting NotSupportedException: No data is available for encoding 1255. For information on defining a custom encoding, see the documentation for the Encoding.RegisterProvider method. – Doctor Strange Mar 19 '18 at 20:31
  • Please take a look at the documentation of [Encoding.GetEncodings()](https://msdn.microsoft.com/en-us/library/system.text.encoding.getencodings(v=vs.110).aspx). Inside this article is sample code to retrieve all encodings which your operating system does currently support – dsdel Mar 19 '18 at 20:35
  • I tried that approach, But got very few results which were not relevant, Could it be because I'm using .Net Core? – Doctor Strange Mar 19 '18 at 20:37
  • Indeed, please try the approach described in this [so post](https://stackoverflow.com/questions/37870084/net-core-doesnt-know-about-windows-1252-how-to-fix) – dsdel Mar 19 '18 at 20:38
  • I see, Thanks for your answer I wouldn't think about it if you didn't brainstorm with me =), Please write your answer and I will mark it as answered. – Doctor Strange Mar 19 '18 at 20:41