I am doing analysis using pyspark dataframe.
There is one column was called: json_data
. It looks like this:
Then I tried to convert it as a dictionary type format using following code:
from pyspark.sql.functions import udf
func = udf(lambda x: eval(x))
df_beer = df_beer.withColumn('json_data_new', func(df_beer.json_data))
After conversion, the new column 'json_data_new'
looks like this
Then I checked the data type of the old and new columns, both of them are still string type.
Question: How can I extract the numbers linked to the key "2_QTDE"
and save it as a new column?
I knew it is a json like string, and I had a hard time dealing with this format.
I tried the python way using dictionary key, but it does not work.
So, I thought maybe I need to write a function to extract the numbers from the json_data_new:
df_beer = df_beef.WithColumn('newColumn', func_extract(df_beer.json_data_new))
How to properly define the function func_extract
? Thanks!