Duplicate column in json file throws error when creating Spark External Table

Asked Aug 10 '22 at 09:26

Active Aug 11 '22 at 10:06

Viewed 241 times

When trying to create a Spark SQL table reading from a JSON file, it fails with the below error Error in SQL statement: AnalysisException: Found duplicate column(s) in the data schema: filename

As I understand this issue is related to Spark runtime and an alternative approach suggested for dataframes is Duplicate column in json file throw error when creating PySpark dataframe Databricks after upgrading runtime 7.3LTS(Spark3.0.1) to 9.1LTS(Spark3.1.2)

Looking for a similar alternative for SparkSQL table. I tried to emulate it for spark table creation but not sure how to call out schema while creating table, any help is highly appreciated.

Note: My need is to create an external table in databricks pointing to JSON file which has duplicate column(s) issue.

I am creating a sql table as below which is resulting in the error because the json file has some duplicate column code_snippet

edited Aug 11 '22 at 10:06

asked Aug 10 '22 at 09:26

Harsha Vardhan

Please provide enough code so others can better understand or reproduce the problem. – Community Aug 10 '22 at 14:45
Added a code snippet – Harsha Vardhan Aug 11 '22 at 10:05

Duplicate column in json file throws error when creating Spark External Table

0 Answers0