I have multiple JSON files with a structure similar to these ones
{
"fields": [
{
"a": 1,
"b": "Mike",
"c": "Jordan"
},
{
"a": 2,
"b": "Filip",
"c": "White"
}
]
}
{
"fields":{
"a": 2,
"b": "Mark",
"c": "Brown"
}
}
which I load in the same DataFrame
df = spark.read.option("multiLine", True).json("/path/to/jsons")
I need to extract only the value for "b" when "a" = 2 (in this case Mark and Filip). I have 2 problems:
- "fields" contains 2 different types (Struct and Array(Struct)) in the same column
- how to extract only field "b"
I'm using PySpark.
Thanks in advance.
df.withColumn("b values", col("fields") ... ??)