Reading large text files with Pandas

Question

I have been trying to read a few large text files (sizes around 1.4GB - 2GB) with Pandas, using the read_csv function, with no avail. Below are the versions I am using:

Python 2.7.6
Anaconda 1.9.2 (64-bit) (default, Nov 11 2013, 10:49:15) [MSC v.1500 64 bit (AMD64)]
IPython 1.1.0
Pandas 0.13.1

I tried the following:

df = pd.read_csv(data.txt')

and it crashed Ipython with a message: Kernel died, restarting.

Then I tried using an iterator:

tp = pd.read_csv('data.txt', iterator = True, chunksize=1000)

again, I got the Kernel died, restarting error.

Any ideas? Or any other way to read big text files?

Thank you!

I did not get this error on my machine, with a similar configuration than yours. How much RAM memory do you have? On my machine Python needed a peak of around 5GB to read a csv with 2.9 GB using `pd.read_csv()` — Saullo G. P. Castro, May 01 '14 at 16:25
@SaulloCastro My machine has 8GB installed. It should be able to handle such a filesize, since most of the installed RAM is available, I am not running anything else. — marillion, May 01 '14 at 16:38

score 9 · Answer 1 · edited Nov 02 '19 at 18:26

9

A solution for a similar question was given here some time after the posting of this question. Basically, it suggests to read the file in chunks by doing the following:

chunksize = 10 ** 6  # number of rows per chunk
for chunk in pd.read_csv(filename, chunksize=chunksize):
    process(chunk)

You should specify the chunksize parameter accordingly to your machine's capabilities (that is, make sure it can process the chunk).

edited Nov 02 '19 at 18:26

Laurent S

4,106
3
26
50

answered Jun 26 '17 at 21:51

DarkCygnus

7,420
4
36
59

what is 10 ** 6, please enlighten us lesser enlightened ones?? Also, this does not give the solution of storing the chunk into dataframe and concatenation of all such dataframes afterwards. – Rahul Saini Jul 09 '19 at 16:47
That 10 raise to power 6 is intuitive. what is it KB, MB, lines in the file, what is it ??? – Rahul Saini Jul 09 '19 at 16:53
Perhaps a more explanatory and useful link be mentioned here : https://pythondata.com/working-large-csv-files-python/ – Rahul Saini Jul 09 '19 at 16:56
Oh, sorry didn't get you quite right. It's number of rows per chunk. – DarkCygnus Jul 09 '19 at 16:57
I suggest you check the target dupe question as it has relevant and useful info for you :) thanks for the link also, will check it out – DarkCygnus Jul 09 '19 at 16:57

Reading large text files with Pandas

1 Answers1

Linked