How to Ignore Empty Parquet Files in Databricks When Creating Dataframe

Asked Jan 03 '20 at 15:28

Active Jan 03 '20 at 17:01

Viewed 2,296 times

I am having issues loading multiple files into a dataframe in Databricks. When I load a parquet file in an individual folder, it is fine, but the following error returns when I try to load multiple files in the dataframe:

DF = spark.read.parquet('S3 path/')

"org.apache.spark.sql.AnalysisException: Unable to infer schema for Parquet. It must be specified manually."

Per other StackOverflow answers, I added spark.sql.files.ignoreCorruptFiles true to the cluster configuration but it didn't seem to resolve the issue. Any other ideas?

edited Jan 03 '20 at 17:01

blackbishop

30,945
11
55
76

asked Jan 03 '20 at 15:28

Chrisw

1

have a look at: https://stackoverflow.com/questions/44954892/unable-to-infer-schema-when-loading-parquet-file. – venus Jan 03 '20 at 19:48
As Venus is suggesting, stop writing empty files to avoid issues while reading. – Salim Jan 03 '20 at 23:46

How to Ignore Empty Parquet Files in Databricks When Creating Dataframe

0 Answers0