24

I've got a csv that I'm reading into a pandas dataframe. However one of the columns is in the form of a dictionary. Here is an example:

ColA, ColB, ColC, ColdD
20, 30, {"ab":"1", "we":"2", "as":"3"},"String"

How can I turn this into a dataframe that looks like this:

ColA, ColB, AB, WE, AS, ColdD
20, 30, "1", "2", "3", "String"

edit I fixed up the question, it looks like this but is a string that needs to be parsed, not dict object.

user1274037
  • 395
  • 1
  • 2
  • 10
  • Are you sure that's exactly how the csv file is formatted? If so, the formatting is terrible. In order to parse it correctly, the leading spaces should be removed from the entries, the "{...}" should be enclosed in double-quotes, and there should be no double-quotes in the "{...}" itself. Is it possible to reformat the csv? If not, the best solution might be to write a python function to reformat the file, then to parse it, and finally create a DataFrame. Or do you mean you already have everything in a DataFrame? Maybe I'm still a bit confused. – McMath Mar 29 '15 at 08:17
  • Possible duplicate of [Splitting dictionary/list inside a Pandas Column into Separate Columns](http://stackoverflow.com/questions/38231591/splitting-dictionary-list-inside-a-pandas-column-into-separate-columns) – psychemedia Feb 01 '17 at 01:45

3 Answers3

40

As per https://stackoverflow.com/a/38231651/454773, you can use .apply(pd.Series) to map the dict containing column onto new columns and then concatenate these new columns back into the original dataframe minus the original dict containing column:

dw=pd.DataFrame( [[20, 30, {"ab":"1", "we":"2", "as":"3"},"String"]],
                columns=['ColA', 'ColB', 'ColC', 'ColdD'])
pd.concat([dw.drop(['ColC'], axis=1), dw['ColC'].apply(pd.Series)], axis=1)

Returns:

ColA    ColB    ColdD   ab  as  we
20      30      String  1   3   2
Community
  • 1
  • 1
psychemedia
  • 5,690
  • 7
  • 52
  • 84
  • 2
    Brilliant answer thanks. Was just right for helping me to understand how to extract the values inside a dict stored in a dataframe's column. THANK YOU. And if you ask me - the more pythonic way to do it! – kiltannen May 04 '18 at 12:23
  • I do have a question that is very similar to this one: https://stackoverflow.com/questions/51027339/dataframe-table-in-table-from-nested-dictionary and I can find nobody to answer mine, I will ask here as well. How would one have the content of the dictionary only in one cell and not spread across columns? – user637338 Jul 09 '18 at 12:57
  • @psychemedia Excellent answer! Thanks so much, this helped me with a similar case! – Uralan Aug 29 '19 at 19:13
18

So starting with your one row df

    Col A   Col B   Col C                           Col D
0   20      30      {u'we': 2, u'ab': 1, u'as': 3}  String1

EDIT: based on the comment by the OP, I'm assuming we need to convert the string first

import ast
df["ColC"] =  df["ColC"].map(lambda d : ast.literal_eval(d))

then we convert Col C to a dict, transpose it and then join it to the original df

dfNew = df.join(pd.DataFrame(df["Col C"].to_dict()).T)
dfNew

which gives you this

    Col A   Col B   Col C                           Col D   ab  as  we
0   20      30      {u'we': 2, u'ab': 1, u'as': 3}  String1 1   3   2

Then we just select the columns we want in dfNew

dfNew[["Col A", "Col B", "ab", "we", "as", "Col D"]]

    Col A   Col B   ab  we  as  Col D
0   20      30      1   2   3   String1
Bob Haffner
  • 8,235
  • 1
  • 36
  • 43
  • Thanks, but I received this error: ValueError: If using all scalar values, you must pass an index - when I try to do the dfNew step. – user1274037 Mar 30 '15 at 00:00
  • OK, try these two lines before the dfNew step. I updated my answer as well. import ast df["ColC"] = df["ColC"].map(lambda d : ast.literal_eval(d)) – Bob Haffner Mar 30 '15 at 01:03
  • Thanks for this solution. Had a string that looked like a dict, took me a while to figure out I need to convert it. – DataPlankton Nov 04 '19 at 16:00
  • Has the advantage of being much faster than using apply. – kasperhj Mar 23 '21 at 11:39
4

What about something like:

import pandas as pd

# Create mock dataframe
df = pd.DataFrame([
    [20, 30, {'ab':1, 'we':2, 'as':3}, 'String1'],
    [21, 31, {'ab':4, 'we':5, 'as':6}, 'String2'],
    [22, 32, {'ab':7, 'we':8, 'as':9}, 'String2'],
], columns=['Col A', 'Col B', 'Col C', 'Col D'])

# Create dataframe where you'll store the dictionary values
ddf = pd.DataFrame(columns=['AB','WE','AS'])

# Populate ddf dataframe
for (i,r) in df.iterrows():
    e = r['Col C']
    ddf.loc[i] = [e['ab'], e['we'], e['as']]

# Replace df with the output of concat(df, ddf)
df = pd.concat([df, ddf], axis=1)

# New column order, also drops old Col C column
df = df[['Col A', 'Col B', 'AB', 'WE', 'AS', 'Col D']]

print(df)

Output:

   Col A  Col B  AB  WE  AS    Col D
0     20     30   1   2   3  String1
1     21     31   4   5   6  String2
2     22     32   7   8   9  String2
jedwards
  • 29,432
  • 3
  • 65
  • 92