I have the following dataframe in spark:
root
|-- user_id: string (nullable = true)
|-- payload: string (nullable = true)
in which payload is an json string with no fixed schema, here are some sample data:
{'user_id': '001','payload': '{"country":"US","time":"11111"}'}
{'user_id': '002','payload': '{"message_id":"8936716"}'}
{'user_id': '003','payload': '{"brand":"adidas","when":""}'}
I want to output the above data in json format with the flattened payload(basically just extracting key value pairs from payload and put them into the root level), for example:
{'user_id': '001','country':'US','time':'11111'}
{'user_id': '002','message_id':'8936716'}
{'user_id': '003','brand':'adidas','when':''}
Stackoverflow said this is a duplicated question to Flatten Nested Spark Dataframe but it's not.. The difference here is that the value of payload in my case is just string type.