As shown in the below code, I am reading a JSON file into a dataframe and then selecting some fields from that dataframe into another one.
df_record = spark.read.json("path/to/file.JSON",multiLine=True)
df_basicInfo = df_record.select(col("key1").alias("ID"), \
col("key2").alias("Status"), \
col("key3.ResponseType").alias("ResponseType"), \
col("key3.someIndicator").alias("SomeIndicator") \
)
Issue is that some times, the JSON file does not have some of the keys that I try to fetch - like ResponseType
. So it ends up throwing errors like:
org.apache.spark.sql.AnalysisException: No such struct field ResponseType
How can I get around this issue without forcing a schema at the time of read? is it possible to make it return a NULL under that column when it is not available?
how do I detect if a spark dataframe has a column Does mention how to detect if a column is available in a dataframe. This question, however, is about how to use that function.