1

I am having issues loading multiple files into a dataframe in Databricks. When I load a parquet file in an individual folder, it is fine, but the following error returns when I try to load multiple files in the dataframe:

DF = spark.read.parquet('S3 path/') 

"org.apache.spark.sql.AnalysisException: Unable to infer schema for Parquet. It must be specified manually."

Per other StackOverflow answers, I added spark.sql.files.ignoreCorruptFiles true to the cluster configuration but it didn't seem to resolve the issue. Any other ideas?

blackbishop
  • 30,945
  • 11
  • 55
  • 76
Chrisw
  • 11
  • 2
  • 1
    have a look at: https://stackoverflow.com/questions/44954892/unable-to-infer-schema-when-loading-parquet-file. – venus Jan 03 '20 at 19:48
  • As Venus is suggesting, stop writing empty files to avoid issues while reading. – Salim Jan 03 '20 at 23:46

0 Answers0