Importing only a few columns of a csv as a python pandas dataframe?

Question

I would like to only import a subset of a csv as a dataframe as it is too large to import the whole thing. Is there a way to do this natively in pandas without having to set up a database like structure?

I have tried only importing a chunk and then concatenating and this is still too large and causes memory error. I have hundreds of columns so manually specifying dtypes could help, but would likely be a major time commitment.

df_chunk = pd.read_csv("filename.csv", chunksize=1e7)
df = pd.concat(df_chunk,ignore_index=True)

Possible duplicate of [How to read a 6 GB csv file with pandas](https://stackoverflow.com/questions/25962114/how-to-read-a-6-gb-csv-file-with-pandas) — gosuto, Oct 21 '19 at 18:09

Sheldon · Answer 1 · 2019-10-21T18:11:16.507

2

You may use the skiprows and nrows arguments in the read_csv function to load only a subset of rows from your original dataframe.

For instance:

 import pandas as pd
 df = pd.read_csv("test.csv", skiprows = 4, nrows=10)

edited Oct 21 '19 at 18:11

answered Oct 21 '19 at 18:08

Sheldon

4,084
3
20
41

I need all rows. what about columns? – Bstampe Oct 21 '19 at 18:11
1

For that you can use the *usecols* parameter. – Sheldon Oct 21 '19 at 18:15

Importing only a few columns of a csv as a python pandas dataframe?

1 Answers1