I have an s3 bucket parquet file that is partitioned by date
eg: s3://path/folder/
where the partitions in the folder are:
PRE date=2019-11-19/
PRE date=2019-11-20/
PRE date=2019-11-21/
PRE date=2019-11-22/
PRE date=2019-11-23/
PRE date=2019-11-26/
Each partition has millions of rows, and I want to parse it by calling each partition in a for loop and appending the resulting dataframe in another parquet file also partitioned by date. None of the solutions Ive looked up on here address my specific use case, and the few that seem to, use something called boto, which I am not using.
Any insight would be greatly appreciated. Thank You