0

I would like to only import a subset of a csv as a dataframe as it is too large to import the whole thing. Is there a way to do this natively in pandas without having to set up a database like structure?

I have tried only importing a chunk and then concatenating and this is still too large and causes memory error. I have hundreds of columns so manually specifying dtypes could help, but would likely be a major time commitment.

df_chunk = pd.read_csv("filename.csv", chunksize=1e7)
df = pd.concat(df_chunk,ignore_index=True)
Bstampe
  • 689
  • 1
  • 6
  • 16

1 Answers1

2

You may use the skiprows and nrows arguments in the read_csv function to load only a subset of rows from your original dataframe.

For instance:

 import pandas as pd
 df = pd.read_csv("test.csv", skiprows = 4, nrows=10)
Sheldon
  • 4,084
  • 3
  • 20
  • 41