I am having a problem in converting .csv file to multiline json file using pyspark.
I have a csv file read via spark rdd and I need to convert this to multiline json using pyspark.
Here is my code:
import json
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("jsonconversion").getOrCreate()
df = spark.read.format("csv").option("header","True").load(csv_file)
df.show()
df_json = df.toJSON()
for row in df_json.collect():
line = json.loads(row)
result =[]
for key,value in list(line.items()):
if key == 'FieldName':
FieldName =line['FieldName']
del line['FieldName']
result.append({FieldName:line})
res =result
with open("D:/tasklist/jsaonoutput.json",'a+')as f:
f.write(json.dumps(res, indent=4, separators=(',',':')))
I need the output in below format.
{
"Name":{
"DataType":"String",
"Length":4,
"Required":"Y",
"Output":"Y",
"Address": "N",
"Phone Number":"N",
"DoorNumber":"N/A"
"Street":"N",
"Locality":"N/A",
"State":"N/A"
}
}
My Input CSV file Looks like this:
I am new to Pyspark, Any leads to modify this code to a working code will be much appreciated.
Thank you in advance.