Access a json column with pandas

Question

I have a csv file where one column is json. I want to be able to access the information in the json column but I can't figure it out.

My csv file is like

id, "letter", "json"
1,"a","{""add"": 2}"
2,"b","{""sub"": 5}"
3,"c","{""add"": {""sub"": 4}}"

I'm reading in the like like

test = pd.read_csv(filename)
df = pd.DataFrame(test)

I'd like to be able to get all the rows that have "sub" in the json column and ultimately be able to get the values for those keys.

If "add" is in a JSON sub-field, should that row be included as well? e.g. `{sub: {add: 4}}` — andrew_reece, Apr 27 '17 at 22:17
You just changed the JSON field you're searching for from "add" to "sub". That breaks all the current answers. — andrew_reece, Apr 27 '17 at 22:32
Are there only two levels to the JSON, or can there be an arbitrary number of levels? — andrew_reece, Apr 27 '17 at 22:34
There can be an arbitrary number of levels. Usually it wouldn't go past 2 levels and very rarely past 3 levels. I expect that most of the data I am looking for would show up within the second level — Reimus Klinsman, Apr 27 '17 at 22:36

score 6 · Accepted Answer · edited May 23 '17 at 12:10

Here's one approach, which uses the read_csv converters argument to build json as JSON. Then use apply to select on the json field keys in each row. CustomParser taken from this answer.

EDIT
Updated to look two levels deep, and takes variable target parameter (so it can be "add" or "sub", as needed). This solution won't handle an arbitrary number of levels, though.

def CustomParser(data):
    import json
    j1 = json.loads(data)
    return j1

df = pd.read_csv('test.csv', converters={'json':CustomParser})

def check_keys(json, target):
    if target in json:
        return True
    for key in json:
        if isinstance(json[key], dict):
            if target in json[key]:
                return True
    return False

print(df.loc[df.json.apply(check_keys, args=('sub',))])

   id letter                 json
1   2      b           {'sub': 5}
2   3      c  {'add': {'sub': 4}}

Psidom · Answer 2 · 2017-04-27T22:28:15.537

When you read the file in, the json field will still be of str type, you can use ast.literal_eval to convert the string to dictionary, and then use apply method to check if any cell contain the key add:

from ast import literal_eval
df["json"] = df["json"].apply(literal_eval)
df[df["json"].apply(lambda d: "add" in d)]

#  id   letter  json
#0  1       a   {'add': 2}
#2  3       c   {'add': {'sub': 4}}

In case you want to check nested keys:

def check_add(d):
    if "add" in d:
        return True

    for k in d:
        if isinstance(d[k], dict):
            if check_add(d[k]):
                return True

    return False

df[df["json"].apply(check_add)]

#  id   letter  json
#0  1       a   {'add': 2}
#2  3       c   {'add': {'sub': 4}}

This doesn't check nested values other than dictionary; If you need to, it should be similar to implement based on your data.

Access a json column with pandas

2 Answers2