I would like to read multiple parquet files into a dataframe from S3. Currently, I'm using the following method to do this:
files = ['s3a://dev/2017/01/03/data.parquet',
's3a://dev/2017/01/02/data.parquet']
df = session.read.parquet(*files)
This works if all of the files exist on S3, but I would like to ask for a list of files to be loaded into a dataframe without breaking when some of the files in the list don't exist. In other words, I would like for sparkSql to load as many of the files as it finds into the dataframe, and return this result without complaining. Is this possible?