0

I have a file 500MB+ that has been generated by saving a large excel spreadsheet as unicode. I am running windows 7.

I need to open the file with python pandas. So far I used to convert the file from ANSI to UTF-8 with notepad++ but the file is now too large and then open it with notepad++.

I have Hebrew, French, Swedish, Norwegian, Danish special characters.

  • Panda's read_excel is just too slow * I let it go for several minutes without seeing some output.
  • iconv: apparently I can not get the encoding right, I just get out a list of tab separated nulls when I have tried:

    iconv -f "CP858" -t "UTF-8" file1.txt > file2.txt

    iconv -f "windows-1252" -t "UTF-8" file1.txt > file2.txt

Edit

iconv -f "UTF-16le" -t "UTF-8" file1.txt > file2.txt leads to a very weird behaviour: a row in between lines is cut. All looks fine but only 80K rows are actually converted.

Edit 2

.. read_csv with encoding='utf-16le' reads properly the file. However, I still don't get why iconv messes it up.

Community
  • 1
  • 1
NoIdeaHowToFixThis
  • 4,484
  • 2
  • 34
  • 69

1 Answers1

0

read_csv with encoding='utf-16le' reads properly the file. However, I still don't get why iconv messes it up.

NoIdeaHowToFixThis
  • 4,484
  • 2
  • 34
  • 69