I have a JSON input which consists of an array that will be exploded as follows:
new_df = df \
.withColumn("x", explode_outer(col("x"))) \
.select(
col("x.p").alias("xp"),
col("x.q").alias("xq"),
col("x.r.l.g").alias("xrlg"),
col("x.r.m.f").alias("xrmf"),
col("x.r.n").alias("xrn"),
col("x.r.o").alias("xro"),
col("x.r.s").alias("xrs"),
)
Sometimes the input file may be empty or may not have the JSON key 'x'. In such cases the pyspark code fails saying cannot resolve 'x' given input columns: []
.
Is there a way I can keep all the columns of this table and populate them all as NULL if this key is not present in the input JSON?