1

I have to convert json file to csv file using spark dataframe in databricks. I have tried the below code to convert the json to csv but i'm getting the CSV data source does not support array data type in spark dataframe . I'm unable to convert to csv file .can someone help me on this issue how to remove _corrupt_string?

 import json
    data=r'/dbfs/FileStore/tables/ABC.json'
    print ("This is json data ", data)
    def js_r(data):
       with open(data, encoding='utf-8') as f_in:
           return(json.load(f_in))

    if __name__ == "__main__":
        dic_data_first = js_r(data)
        print("This is my dictionary", dic_data_first)
    keys= dic_data_first.keys()
    print ("The original dict keys",keys)
    dic_data_second={'my_items':dic_data_first['Data']for key in keys}
    with open('/dbfs/FileStore/tables/ABC_1.json', 'w') as f:   
         json.dump(dic_data_first, f)
    df = sqlContext.read.json('dbfs:/FileStore/tables/ABC_1.json')   # reading a json and writing a  parquet
    print(df)
df.write.mode("overwrite").format("com.databricks.spark.csv").option("header","true").csv("/dbfs/FileStore/tables/ABC_1.csv")
JSON data as follows:
{"Table":"test1",
  "Data":[
{"aa":"1",
 "bb":"2"},
{"aa" :"ss",
"bb":"dc"}            
}]
}
pythonUser
  • 183
  • 2
  • 7
  • 20
  • https://stackoverflow.com/a/28246154/8150685 – Error - Syntactical Remorse Apr 09 '19 at 15:48
  • Possible duplicate of [How can I convert JSON to CSV?](https://stackoverflow.com/questions/1871524/how-can-i-convert-json-to-csv) – Error - Syntactical Remorse Apr 09 '19 at 15:50
  • @Remorse-None of the answers my question and i have updated some code changes as well.Can you please help me on this issue? – pythonUser Apr 10 '19 at 07:45
  • Please provide the output you want for the json you have. Also what is `dic_data_second={'my_items':dic_data_first['DATA']for key in keys}` supposed to do? I don't think you are doing what you think you are doing with that line. Also I don't understand what makes your problem different then that answer. Your json is different but the skeleton of the code is the same. – Error - Syntactical Remorse Apr 10 '19 at 12:22
  • @Remorse-Thanks for your reply.i have updated the question.dic_data_seconds gives me DATA one .so i used keys to get the column headers.But after writing to csv file i'm getting the error as CSV data source does not support array data type.So can you please help me on this – pythonUser Apr 10 '19 at 13:40
  • Please post the error. And you aren't using the list comprehension for the dictionary right. `{'my_items':dic_data_first['DATA']for key in keys}` is no different then `{'my_items':dic_data_first['DATA']}` except you are using more overhead because you are doing the same operation for every key. – Error - Syntactical Remorse Apr 10 '19 at 13:42
  • @Remorse: Error : CSV data source does not support array> data type – pythonUser Apr 10 '19 at 13:49

0 Answers0