Python Pandas, one dict column, create new rows for each key/value pair

Question

I have a Pandas DataFrame with about 500000 lines in the following format:

**ID  Name  Tags**
4345  Bill  {'circle:blue', 'background:orange', 'Type':12}

For more straightforward data analysis I would like to convert to:

**ID   Name  Key         Value** 
4345   Bill  Circle      Blue
4345   Bill  Background  Orange
4345   Bill  Type        12

I have found an answer that can split one key/value per row: Python Pandas: How to split a sorted dictionary in a column of a dataframe, but I have failed miserably to extend this to perform my requirement above.

I could probably manage this with some standard loops, but I'm hoping there's an elegant and efficient Pandas approach?

score 3 · Accepted Answer · edited May 23 '17 at 12:23

3

Based on this answer, you can do something similar:

>>> df_tags = df.apply(lambda x: pd.Series(x['Tags']),axis=1).stack().reset_index(level=1, drop=False)
>>> df_tags.columns = ['Key', 'Value']
>>> df_tags
          Key   Value
0        Type      12
0  background  orange
0      circle    blue
>>> df.drop('Tags', axis=1).join(df_tags)
     ID  Name         Key   Value
0  4345  Bill        Type      12
0  4345  Bill  background  orange
0  4345  Bill      circle    blue

edited May 23 '17 at 12:23

Community

1
1

answered Mar 25 '16 at 10:16

Roman Pekar

107,110
28
195
197

The solution worked very well with a small test data set, but my computer went into meltdown on a 0.5M row table. I've reverted to doing this in Postgresql and importing the results to Pandas. – stephen Mar 26 '16 at 04:32

Python Pandas, one dict column, create new rows for each key/value pair

1 Answers1