0

I have a string type column which contains a list of elements which are nested dictionary objects. E.g

str = '[{"method":
{"super":0.03,"normal":0.8,"par":0.15,"goal":0.01,"fact":0.04},
"city":["nyc","atlanta"],
"description":"some description",
"content_type":"media"},
{"method":
{"super":0.03,"normal":0.8,"par":0.15,"goal":0.01,"fact":0.04},
"city":["chicago","dallas"],
"description":"some description2",
"content_type":"web"},
{"method":
{"super":0.03,"normal":0.8,"par":0.15,"goal":0.01,"fact":0.04},
"city":["las vegas","buffalo"],
"description":"some description3",
"content_type":"media"}]'

This is actually a column in a spark dataframe which is of a string type. So I want to know how to convert the contents of a string into a list such that I can convert each element in the list using json.loads.

Any idea?

cs95
  • 379,657
  • 97
  • 704
  • 746
Arvind Kandaswamy
  • 1,821
  • 3
  • 21
  • 30
  • 2
    Minor syntax issue aside (newlines in a single-quoted string), what you have can be passed directly to `json.loads`, which will decode it to a list of Python `dict`s. – chepner Feb 03 '18 at 02:14
  • 1
    What's the end goal here? There are JSON functions in PySpark. I wonder if this is an [XY problem](https://meta.stackexchange.com/a/66378). – pault Feb 03 '18 at 02:22
  • It is indeed an XY problem. End goal is to retrieve the nested json object from a pyspark dataframe. Thanks for pointing out – Arvind Kandaswamy Feb 03 '18 at 19:42

3 Answers3

1

json.loads should work fine with this data - it will return a list of dicts.

stuartgm
  • 89
  • 3
1
import json
msg = '''[{"method":
{"super":0.03,"normal":0.8,"par":0.15,"goal":0.01,"fact":0.04},
"city":["nyc","atlanta"],
"description":"some description",
"content_type":"media"},
{"method":
{"super":0.03,"normal":0.8,"par":0.15,"goal":0.01,"fact":0.04},
"city":["chicago","dallas"],
"description":"some description2",
"content_type":"web"},
{"method":
{"super":0.03,"normal":0.8,"par":0.15,"goal":0.01,"fact":0.04},
"city":["las vegas","buffalo"],
"description":"some description3",
"content_type":"media"}]'''
json.loads(msg)[0]



 out:
{'city': ['nyc', 'atlanta'],
 'content_type': 'media',
 'description': 'some description',
 'method': {'fact': 0.04,
  'goal': 0.01,
  'normal': 0.8,
  'par': 0.15,
  'super': 0.03}}
not_python
  • 904
  • 6
  • 13
-1

There are a few steps to solve this, and i hope that json.loads is what you even need when your done. if you follow these steps, your string will converted into a list of dictionaries (and there is almost no need for json.loads)

first step, remove the brackets " [] " from the string assuming you set your string to the variable "value" run these lines:

value = value[1:]
value = value[:-1]

Next, we are going to prepare to use pythons .split() function which will turn any string into a list by separating it by any charecter you select. problem is, there are two levels of commas in your string. One between the list, and one beween the dictionary values. so before we split the lines, lets turn the commas for your list, into semi colors so our split function can spit the string at the proper place. Use this code:

value = value.replace('},','};') 

Lastly, split the string into a list using this code value = value.split(';')

Let me know if this works for you goodluck