I have a large parquet file that I can read into a pandas dataframe with read_parquet(). However, I wanted to process the file chunk by chunk and then create the processed dataframe. Is there any way I wan achieve this? read_csv with chunk size is not an option for my case. Thanks in advance.
Asked
Active
Viewed 750 times
0
-
This might help you - https://stackoverflow.com/questions/59098785/is-it-possible-to-read-parquet-files-in-chunks – YoungTim Nov 03 '21 at 16:56
-
Look at Iteration from fastparquet Python lib: https://fastparquet.readthedocs.io/en/latest/details.html#iteration – darked89 Nov 03 '21 at 22:29