1

I have a parquet file which is having a size of 350 GB. therefore I want to read the data in chunk.

I am aware of reading full parquet file and then convert them to pandas as below.

import pyarrow.parquet as pq
table = pq.read_table(filepath)
df = table.to_pandas(integer_object_nulls=True)

Not sure whether it is possible to read data chunk by chunk. Can someone please clarify on this!

rpanai
  • 12,515
  • 2
  • 42
  • 64
Py1996
  • 219
  • 1
  • 15
  • Have you considered using [dask](https://dask.org/)? Pandas doesn't have the option to read by chunk. – rpanai Jun 29 '21 at 13:56
  • I need to use already developed framework to add some extra columns(this uses pandas). I do not have much idea on dask. is it possible to read data in chunk wise in dask and then convert to pandas , so I can use my other python code to add new fields – Py1996 Jun 29 '21 at 14:05
  • You can read the big parquet file with dask and save in many smaller parquet or use basically all your code to add new columns. If you can produce a [mcve](/help/mcve) I can try to help you. – rpanai Jun 29 '21 at 14:41
  • Does this answer your question? [Is it possible to read parquet files in chunks?](https://stackoverflow.com/questions/59098785/is-it-possible-to-read-parquet-files-in-chunks) – Vikram Jan 17 '23 at 20:02

0 Answers0