I have a s3 folder structure like this:
bucketname/20211127123456/.parquet files
bucketname/20211127456789/.parquet files
bucketname/20211126123455/.parquet files
bucketname/20211126746352/.parquet files
bucketname/20211124123455/.parquet files
bucketname/20211124746352/.parquet files
Basically for each day there are two folders and inside that I have multiple parquet files which I want to read.
Let's say I want to read all files from the folders for 27th and 26th Nov
.
Right now I have boto3 function which is giving me a python list that includes all parquet files complete s3 path which has 20211126
and 20211127
in the s3 path and that list I am passing to spark.read
. Is there any better way to achieve this?