I spent a few hours searching for hints on how to do this, and tried a bunch of things (see below). I'm not getting anywhere, so I finally decided to post a new question.
I have a nested JSON with a dictionary data structure, like this:
for k,v in d.items():
print(f'{k} = {v}')
First two keys:
obj1 = {'color': 'red', 'size': 'large', 'description': 'a large red ball'}
obj2 = {'color': 'blue', 'size': 'small', 'description': 'a small blue ball'}
Side question: is this actually a nested json? Each key (obj1, obj2) has a set of keys, so I think so but I'm not sure.
I then have a dataframe like this:
df
key id_num source
obj1 143 loc1
obj2 139 loc1
I want to map only 'size' and 'description' from my json dictionary to this dataframe, by key. And I want to do that efficiently and readably. I also want it to be robust to the presence of the key, so that if a key doesn't exist in the JSON dict, it just prints "NA" or something.
Things I've tried that got me closest (I tried to map one column at a time, and both at same time):
df['description'] = df['key'].map(d['description'])
df['description'] = df['key'].map(lambda x: d[x]['description'])
df2 = df.join(pd.DataFrame.from_dict(d, orient='index', columns=['size','description']), on='key')
The first one - it's obvious why this doesn't work. It prints KeyError: 'description', as expected. The second one I think would work, but there is a key in my dataframe that doesn't exist in my JSON dict. It prints KeyError: 'obj42' (an object in my df but not in d). The third one works, but requires creating a new dataframe which I'd like to avoid.
How can I make Solution #2 robust to missing keys? Also, is there a way to assign both columns at the same time without creating a new df? I found a way to assign all values in the dict here, but that's not what I want. I only want a couple.
There's always a possibility that my search keywords were not quite right, so if a post exists that answers my question please do let me know and I can delete this one.