0

I am doing analysis using pyspark dataframe.

There is one column was called: json_data. It looks like this:

enter image description here

Then I tried to convert it as a dictionary type format using following code:

from pyspark.sql.functions import udf
func = udf(lambda x: eval(x))
df_beer = df_beer.withColumn('json_data_new', func(df_beer.json_data))

After conversion, the new column 'json_data_new' looks like this

enter image description here

Then I checked the data type of the old and new columns, both of them are still string type. enter image description here

Question: How can I extract the numbers linked to the key "2_QTDE" and save it as a new column?

I knew it is a json like string, and I had a hard time dealing with this format.

I tried the python way using dictionary key, but it does not work.

So, I thought maybe I need to write a function to extract the numbers from the json_data_new:

df_beer = df_beef.WithColumn('newColumn', func_extract(df_beer.json_data_new))

How to properly define the function func_extract? Thanks!

Elsa Li
  • 673
  • 3
  • 9
  • 19
  • 1
    Please [don't post pictures of code](https://meta.stackoverflow.com/questions/285551/why-not-upload-images-of-code-on-so-when-asking-a-question). – pault Apr 18 '18 at 15:02
  • 5
    Related to [How to query JSON data column using Spark DataFrames?](https://stackoverflow.com/questions/34069282/how-to-query-json-data-column-using-spark-dataframes) – Alper t. Turker Apr 18 '18 at 15:04
  • 1
    Although the linked dupe is in scala, the answer is essentially the same. You need to define a schema for your JSON, then convert it into a `StructType` using [`pyspark.sql.functions.from_json`](http://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html#pyspark.sql.functions.from_json). After that you can access items from the struct using [`pyspark.sql.Columns.getItem(key)`](http://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html#pyspark.sql.Column.getItem) – pault Apr 18 '18 at 15:09
  • @pault Thanks for the suggestion. I wonder as the column type is string type, is it possible to use regular expression match? – Elsa Li Apr 18 '18 at 17:18
  • @QianLi you probably *could* use regex, but I would not recommend it. Read [this post](https://stackoverflow.com/questions/8750127/regex-for-parsing-single-key-values-out-of-json-in-javascript). – pault Apr 18 '18 at 17:44

0 Answers0