3

I'm using the CsvProvider on a csv File generated out of an Excel. The csv file contains some specific German characters ('ä', 'ö', ...) that are not read correctly.

I tryied with the Encoding parameter, used a stream with UTF8 Encoding, saved the Excel file as ".csv", ".csv" in MS-DOS Format, ... But, the result was always the same.

Here a reduced code sample:

open FSharp.Data

type MyCsvProvider = CsvProvider<"Mappe1.csv",Separators=";">

[<EntryPoint>]
let main argv = 
    let csvDatas = MyCsvProvider.Load("..\..\Mappe1.csv")

    for i in csvDatas.Rows do
        printfn "Read: Beispiel='%s' and Änderungsjahr='%d" i.Beispiel i.``�nderungsjahr`` 

    0 

Here the corresponding CsvFile:

Beispiel;Änderungsjahr
Data1;2000
Überlegung;2010

And here the result after execution:

Read: Beispiel='Data1' and Änderungsjahr='2000
Read: Beispiel='?berlegung' and Änderungsjahr='2010
Fyodor Soikin
  • 78,590
  • 9
  • 125
  • 172
  • Correct Title should be "Encoding of "ä", "ö", ... doesn't work properly" – Christophe Grévent Jul 03 '16 at 09:45
  • What encoding does your file use? – Fyodor Soikin Jul 03 '16 at 10:11
  • I had a look at excel, but there is no encoding option there... Currently, I use excel to define the file, then I save it as .csv... I'm not doing more than this... I'm using a german localized Windows 10, where I didn't change anything. Did I miss an encoding option somewhere while generating the CSV ? – Christophe Grévent Jul 04 '16 at 07:27

2 Answers2

0

I'm not into F#, but I reckon could be a locale settings for the console. Set a breakpoint and check the actual bytes value with the debugger for the string.

For instance Überlegung starts with Ü which has 0xDC as ASCII Code if you get this value from the debugger then it's only the console locale must be set.

Try having a look at this so question about setting the locale even if it is for c++ should be something you can adapt to your environment.

Community
  • 1
  • 1
Lookaji
  • 1,023
  • 10
  • 21
  • It's a real F# + FShap.Data topic... The console is able to deliver UTF-8 informations (as the 'Ä' for "Änderungsjahr" shows it in the result of the execution...). I think it's the way the Fsharp.data library parses the .csv file... It doesn't take the right encoding, and I don't know how to correct it... – Christophe Grévent Jul 03 '16 at 21:45
0

OK, I found the problem: Using CSV in Excel generates more or less ASCII, but no UTF. The Format to use is "Unicode (Text)", which generates real unicode, with '\t' as separator instead of ';' or ','. Works for me... Thus I close the question... Thanks to all!