0

I have values in a .csv file that that look like this:

drieëntachtig
één

Now I try to read in the relevant values using the following commands;

df = pd.read_csv('test.csv', sep=";")
numbers = df['numbers'].tolist()

However when I look at the values now I see this in my console:

drie�ntachtig
��n

Could anybody tell how I read the values in python 2.7 so I get the normal values? I already tried:

df = pd.read_csv('test.csv', sep=";", encoding= "uft8")
Henk Straten
  • 1,365
  • 18
  • 39
  • 1
    You should write encoding, not encodeing. Does it work like that? – Horia Coman May 05 '17 at 07:54
  • No, this was a typo... – Henk Straten May 05 '17 at 08:02
  • It seems to me like this isn't an issue with Python, but the console you are outputting data too. Could you try opening a console/command prompt and enter `echo één`? – alxwrd May 05 '17 at 08:11
  • Possible duplicate of [Reading a UTF8 CSV file with Python](http://stackoverflow.com/questions/904041/reading-a-utf8-csv-file-with-python) – eli-bd May 05 '17 at 08:17
  • Are you sure that the encoding of the file is UTF8? The special characters shown here could be Latin1 or Latin9 or win1252... A simple option if to use an editor able to process different codepages like the excellents [gvim](http://www.vim.org) (multi-platform) or [notepad++](https://notepad-plus-plus.org/) (Windows) – Serge Ballesta May 05 '17 at 08:31

1 Answers1

1

You could encoding it with latin when you read the csv file in pandas, refer to standard-encodings:

df = pd.read_csv('character.csv', sep=";", encoding='latin')

Suppose you have content in character.csv:

test
drieëntachtig
één
banana
orange
apple

Then you print df, it will give you :

        test
0   drieëntachtig
1   één
2   banana
3   orange
4   apple
Tiny.D
  • 6,466
  • 2
  • 15
  • 20