I am trying to do a relatively simple task in Spark, but it's quickly becoming quite painful. I am parsing JSON, and want to update a field after the JSON has been parsed. I want to do this after the parsing, since the JSON is complicated(nested) with many elements.
"attributes" ->
"service1" ->
"service2" ->
...
"keyId"
However, that approach seems just as complicated. The generated Row
does not seem to know the columns outside of the top level ones("attributes"/"keyId"). So for example, I cannot seem to do withColumn
because the top level row does not see it.
jsonDf.map((parsedJson: Row) => {
val targetFieldToReplace = parsedJson.getAs[Row](0).getList[Row](2).get(0).getAs[String](0)
????
})
I am able to extract the value, but I don't know how to put it back. I've thought about converting everything into a Sequence, but that is doesn't seem like a good idea because it will flatten the nested structure. I could re-create the Row with each element one by one, but at that point, it seems wrong. What am I missing here?