I have a file 500MB+ that has been generated by saving a large excel spreadsheet as unicode. I am running windows 7.
I need to open the file with python pandas. So far I used to convert the file from ANSI to UTF-8 with notepad++ but the file is now too large and then open it with notepad++.
I have Hebrew, French, Swedish, Norwegian, Danish special characters.
- Panda's
read_excel
is just too slow * I let it go for several minutes without seeing some output. iconv
: apparently I can not get the encoding right, I just get out a list of tab separated nulls when I have tried:iconv -f "CP858" -t "UTF-8" file1.txt > file2.txt
iconv -f "windows-1252" -t "UTF-8" file1.txt > file2.txt
Edit
iconv -f "UTF-16le" -t "UTF-8" file1.txt > file2.txt
leads to a very weird behaviour: a row in between lines is cut. All looks fine but only 80K rows are actually converted.
Edit 2
.. read_csv
with encoding='utf-16le'
reads properly the file. However, I still don't get why iconv
messes it up.