Let's say I have a CSV file with several hundreds of million records. Then I want to convert that CSV into a Parquet file using Python and Pandas to read the CSV and write the Parquet file. But because the file is too big to read it into memory and write a single Parquet file, I decided to read the CSV in chunks of 5M records and create a Parquet file for every chunk. Why would I want to merge all those of parquet files into a single parquet file?
Thanks in advance.