I have a large dataset which is really big to handle as a dataframe. Besides, It takes a long time reading whole data from database every time. Once I tried to write my data to parquet format, and read the data with read_parquet
which was very faster.
So my question is: Can I read the data from database in chunk read_sql
, write it to parquet with pandas to_parquet
, read another chunk (when I delete previous one to save RAM) and append it to the parquet file and so on?