5

I have a really large csv file about 10GB. When ever I try to read in into iPython notebook using

data = pd.read_csv("data.csv")  

my laptop gets stuck. Is it possible to just read like 10,000 rows or 500 MB of a csv file.

John Constantine
  • 1,038
  • 4
  • 15
  • 43
  • Take a look at the `iterator` and `chunksize` options to process the file in chunks. – Barmar Sep 22 '17 at 01:17
  • 1
    did you try to read the documentation at all?? [read csv](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html), hint! look at `nrows=` – DJK Sep 22 '17 at 01:18
  • 1
    @djk47463 It is possible to get random rows using nrows= ? – John Constantine Sep 22 '17 at 17:13

2 Answers2

13

It is possible. You can create an iterator yielding chunks of your csv of a certain size at a time as a DataFrame by passing iterator=True with your desired chunksize to read_csv.

df_iter = pd.read_csv('data.csv', chunksize=10000, iterator=True)

for iter_num, chunk in enumerate(df_iter, 1):
    print(f'Processing iteration {iter_num}')
    # do things with chunk

Or more briefly

for chunk in pd.read_csv('data.csv', chunksize=10000):
    # do things with chunk

Alternatively if there was just a specific part of the csv you wanted to read, you could use the skiprows and nrows options to start at a particular line and subsequently read n rows, as the naming suggests.

miradulo
  • 28,857
  • 6
  • 80
  • 93
  • I'm trying to understand the meaning of param `iterator` in `read_csv()`. Does it make any difference when we set `iterator = True` (default value is False). I've gg but it didn't help. Thanks. – Chau Pham Jan 18 '19 at 04:04
0

Likely a memory issue. On read_csv you can set chunksize (where you can specify number of rows).

Alternatively, if you don't need all the columns, you can change usecols on read_csv to import only the columns you need.

user3212593
  • 496
  • 2
  • 8
  • 1
    Unless you provide an example, this is more of a comment and what you have said here matches exactly to what @Mitch already answered... – DJK Sep 22 '17 at 01:23