I'm currently trying to pull the value out of several dicts appearing in a series of columns, there are two issues:
Since there are 4 columns in question they were unpacked from a previous dict-in-column value via this line of code:
df = pd.concat([df.drop(['ids'], axis = 1), df['ids'].apply(pd.Series)], axis = 1)
What this dict was unpack a dict in a column of the form:
d = {'a': {'id': 12}, 'b': {'id': 13}, 'c': {'id': 14}, 'd': {'id': 15}}
The dict d
being of length between 0-4.
Before unpacking the dataframe the column I unpacked looked like this:
ids
406 {'a': {'id': '12'}}
408 None
409 {'a': {'id': '21'}, 'b': {'id': '23'}}
417 {'a': {'id': '53'}, 'b': {'id': '98'}, 'c': {'id': '45'}}
419 None
After Unpacking it now has the form
a b c
408 None {'id': '12'} None
409 {'id': '32'} {'id': '45'} {'id': '36'}
417 {'id': '09'} {'id': '31'} None
While that initially solved my first problem, I'm now trying to pull the values out of columns that have the dictionaries in them, and I'm kind of at a loss for this.
Potential solutions I've tried are just running the snippet above for each column (a,b,c), however that is both ugly and inefficient. At most I know an easy fix would be to pd.json_normalize
the initial dataframe when I first start my program, however that would require a significant fix and refactor for something that seems that it could be solved trivially. For reference the ideal output would be this:
a b c
408 None 12 None
409 32 45 36
417 09 31 None
And the whole dataframe is several hundred thousand rows, with 20 columns that are in flux.