Pandas read_csv not reading all rows?

Asked Mar 23 '19 at 17:28

Active Mar 23 '19 at 17:28

Viewed 2,005 times

I have a text file containing 7 millions rows of text ~ and encoded in utf-16.

70357719 new.file

new.file: text/plain; charset=utf-16le

When I use pandas read_csv encoding to utf-16 it only imports a percentage of the rows.

Using the following test code;

import pandas as pd 
data = pd.read_csv('new.file',names=['Text'],sep="\n")
print "Plain:",len(data)

data = pd.read_csv('new.file',names=['Text'],encoding="utf-16",sep="\n")
print "utf-16",len(data)

Provides the following output;

'Plain:', 215585254
'utf-16', 65446415

I'm using python 2.7, and have already tested for empty rows in the file (of which there are none).

Basically, I'm at a lost for what to try next, I need all rows of this file to be imported.

asked Mar 23 '19 at 17:28

F.D

Take a look: https://stackoverflow.com/questions/38728366/pandas-cannot-load-data-csv-encoding-mystery and https://stackoverflow.com/questions/55316476/pandas-read-csv-not-reading-all-rows – rafaelc Mar 23 '19 at 17:52
Why are you using sep="\n"? – Burrito Mar 23 '19 at 17:53
RafaelC, the second links goes back to this question. | Benitok, to separate each line = row, I'm aware names= would also do this. – F.D Mar 23 '19 at 17:56

Pandas read_csv not reading all rows?

0 Answers0