0

I am extracting JSON data from a API and trying to write on Azure container path. I am able to display data correctly in notebook, but when i write JSON most of the values are NULL. Any help on where i am going wrong?

headers = {
"accept" : "application/json",
"Content-Type": "application/json",
 "Authorization": "Bearer " + str(token)
}

 response_get= requests.get(getURL, headers=headers)
 response_final=response_get.json()
 print("Type:", type(response_final))
 data = json_normalize(response_final)
 df = spark.createDataFrame(data)
 ##df.coalesce(1).write.parquet(stagingpath,mode='overwrite')
 df.coalesce(1).write.json(stagingpath,mode='overwrite')
Arun.K
  • 103
  • 2
  • 4
  • 21

1 Answers1

1

I have reproduced in my environment and followed below process and got expected results as below and followed Microsoft-Document and SO-Thread:

import requests

response = requests.get('https://reqres.in/api/users?page=3')
rdd = spark.sparkContext.parallelize([response.text])
df = spark.read.json(rdd)
df.show()
dbutils.fs.mount( source = "wasbs://mycontainer@myblobstorageaccount.blob.core.windows.net", mount_point = "/mnt/mymountpoint", extra_configs = {"fs.azure.sas.mycontainer.myblobstorageaccount.blob.core.windows.net": "SAS"})

enter image description here

The run below script to write json:

df.coalesce(1).write.json( "/mnt/mymountpoint/vamo.json")

enter image description here

Output:

Click on folder Vammo.json:

enter image description here

Click on part-00xxx:

enter image description here

Then Click on View/Edit:

enter image description here

RithwikBojja
  • 5,069
  • 2
  • 3
  • 7