Reading large excel file with special characters in pandas

Question

I have a file 500MB+ that has been generated by saving a large excel spreadsheet as unicode. I am running windows 7.

I need to open the file with python pandas. So far I used to convert the file from ANSI to UTF-8 with notepad++ but the file is now too large and then open it with notepad++.

I have Hebrew, French, Swedish, Norwegian, Danish special characters.

Panda's read_excel is just too slow * I let it go for several minutes without seeing some output.
iconv: apparently I can not get the encoding right, I just get out a list of tab separated nulls when I have tried:

iconv -f "CP858" -t "UTF-8" file1.txt > file2.txt

iconv -f "windows-1252" -t "UTF-8" file1.txt > file2.txt

Edit

iconv -f "UTF-16le" -t "UTF-8" file1.txt > file2.txt leads to a very weird behaviour: a row in between lines is cut. All looks fine but only 80K rows are actually converted.

Edit 2

.. read_csv with encoding='utf-16le' reads properly the file. However, I still don't get why iconv messes it up.

score 0 · Accepted Answer · answered Jan 22 '15 at 14:19

0

read_csv with encoding='utf-16le' reads properly the file. However, I still don't get why iconv messes it up.

answered Jan 22 '15 at 14:19

NoIdeaHowToFixThis

4,484
2
34
69

Reading large excel file with special characters in pandas

1 Answers1