I'm trying to create a dataset from a json-string within a dataframe in Databricks 3.5 (Spark 2.2.1). In the code block below 'jsonSchema' is a StructType with the correct layout for the json-string which is in the 'body' column of the dataframe.
val newDF = oldDF.select(from_json($"body".cast("string"), jsonSchema))
This returns a dataframe where the root object is
jsontostructs(CAST(body AS STRING)):struct
followed by the fields in the schema (looks correct). When I try another select on the newDF
val transform = newDF.select($"propertyNameInTheParsedJsonObject")
it throws the exception
org.apache.spark.sql.AnalysisException: cannot resolve '`columnName`' given
input columns: [jsontostructs(CAST(body AS STRING))];;
I'm aparently missing something. I hoped from_json would return a dataframe I could manipulate further.
My ultimate objective is to cast the json-string within the oldDF body-column to a dataset.