How to read parquet files in chunk wise?

Asked Jun 29 '21 at 13:53

Active Jun 29 '21 at 13:55

Viewed 1,472 times

I have a parquet file which is having a size of 350 GB. therefore I want to read the data in chunk.

I am aware of reading full parquet file and then convert them to pandas as below.

import pyarrow.parquet as pq
table = pq.read_table(filepath)
df = table.to_pandas(integer_object_nulls=True)

Not sure whether it is possible to read data chunk by chunk. Can someone please clarify on this!

edited Jun 29 '21 at 13:55

rpanai

asked Jun 29 '21 at 13:53

Py1996

Have you considered using [dask](https://dask.org/)? Pandas doesn't have the option to read by chunk. – rpanai Jun 29 '21 at 13:56
I need to use already developed framework to add some extra columns(this uses pandas). I do not have much idea on dask. is it possible to read data in chunk wise in dask and then convert to pandas , so I can use my other python code to add new fields – Py1996 Jun 29 '21 at 14:05
You can read the big parquet file with dask and save in many smaller parquet or use basically all your code to add new columns. If you can produce a [mcve](/help/mcve) I can try to help you. – rpanai Jun 29 '21 at 14:41
Does this answer your question? [Is it possible to read parquet files in chunks?](https://stackoverflow.com/questions/59098785/is-it-possible-to-read-parquet-files-in-chunks) – Vikram Jan 17 '23 at 20:02

0 Answers0