I'm loading the list of JSON files from a folder where each file ends with a number, so I am using the wild card to load all files at once.
raw_assignments_2 = spark.read.option("multiline","true").option(schema=schema).json("Assignments_*.json")
I am missing one key/value in some files and the spark is ignoring those files while reading the data into DF.
For example,
My file 1 contains the below keys and their values
[{ "id": 8731,
"resource_type":"assignment",
"assignee_id":2478
"status":"complete"}]
My file 2 contains only three keys
[{ "id": 8731,
"resource_type":"assignment",
"assignee_id":2478}]
id,resource_type,assignee_id are mandatory fields that I expect from every JSON file where status is not mandatory. How can I fetch status into dataframe and assign a null value when there is no key in JSON file