0

Very related to Read a small random sample from a big CSV file into a Python data frame .

I have a very big csv, with columns patient_id,visit_data. I want to read a small sample from it, but if I sample a patient I want to sample all of his records.

Community
  • 1
  • 1
ihadanny
  • 4,377
  • 7
  • 45
  • 76

1 Answers1

2

If you want to keep working with .csv, you can read the files in chunks, select and concatenate the pertinent rows from each chunk along the below lines (see docs):

patient_id = id
patient = pd.DataFrame()
for chunk in pd.read_csv(filename, chunksize=chunksize):
    patient = pd.concat([patient, chunk[chunk.patient_id==id])

However, I would recommend taking a look at HDF5 storage via pandas as this allows you to select via queries on indexed data rather than iterating through a file. And there are of course various sql-based options (see basic example)

Stefan
  • 41,759
  • 13
  • 76
  • 81