3

I have a csv file where one column is json. I want to be able to access the information in the json column but I can't figure it out.

My csv file is like

id, "letter", "json"
1,"a","{""add"": 2}"
2,"b","{""sub"": 5}"
3,"c","{""add"": {""sub"": 4}}"

I'm reading in the like like

test = pd.read_csv(filename)
df = pd.DataFrame(test)

I'd like to be able to get all the rows that have "sub" in the json column and ultimately be able to get the values for those keys.

Reimus Klinsman
  • 898
  • 2
  • 12
  • 23

2 Answers2

6

Here's one approach, which uses the read_csv converters argument to build json as JSON. Then use apply to select on the json field keys in each row. CustomParser taken from this answer.

EDIT
Updated to look two levels deep, and takes variable target parameter (so it can be "add" or "sub", as needed). This solution won't handle an arbitrary number of levels, though.

def CustomParser(data):
    import json
    j1 = json.loads(data)
    return j1

df = pd.read_csv('test.csv', converters={'json':CustomParser})

def check_keys(json, target):
    if target in json:
        return True
    for key in json:
        if isinstance(json[key], dict):
            if target in json[key]:
                return True
    return False

print(df.loc[df.json.apply(check_keys, args=('sub',))])

   id letter                 json
1   2      b           {'sub': 5}
2   3      c  {'add': {'sub': 4}}
Community
  • 1
  • 1
andrew_reece
  • 20,390
  • 3
  • 33
  • 58
1

When you read the file in, the json field will still be of str type, you can use ast.literal_eval to convert the string to dictionary, and then use apply method to check if any cell contain the key add:

from ast import literal_eval
df["json"] = df["json"].apply(literal_eval)
df[df["json"].apply(lambda d: "add" in d)]

#  id   letter  json
#0  1       a   {'add': 2}
#2  3       c   {'add': {'sub': 4}}

In case you want to check nested keys:

def check_add(d):
    if "add" in d:
        return True

    for k in d:
        if isinstance(d[k], dict):
            if check_add(d[k]):
                return True

    return False

df[df["json"].apply(check_add)]

#  id   letter  json
#0  1       a   {'add': 2}
#2  3       c   {'add': {'sub': 4}}

This doesn't check nested values other than dictionary; If you need to, it should be similar to implement based on your data.

Psidom
  • 209,562
  • 33
  • 339
  • 356