2

I am doing an exercise in which the current requirement is to "Find the top 10 major project themes (using column 'mjtheme_namecode')".

My first thought was to do group_by, then count and sort the groups.

However, the values in this column are lists of dicts, e.g.

[{'code': '1', 'name': 'Economic management'},
 {'code': '6', 'name': 'Social protection and risk management'}]

and I can't (apparently) group these, at least not with group_by. I get an error.

TypeError: unhashable type: 'list'

Is there a trick? I'm guessing something along the lines of this question.

(I can group by another column that has string values and matches 1:1 with this column, but the exercise is specific.)

df.head() dataframe, 10 records

Vicki B
  • 544
  • 2
  • 9
  • 20
  • are you able to show your sample output? usually you would pass the dict/json into another dataframe to expand it. – Umar.H Aug 02 '19 at 22:00
  • Possible duplicate of [flattening nested Json in pandas data frame](https://stackoverflow.com/questions/52795561/flattening-nested-json-in-pandas-data-frame) – Trenton McKinney Aug 02 '19 at 22:01

1 Answers1

1

There are two steps to solve your problem:

Using pandas==0.25

  1. Flatten the list of dict
  2. Transform dict in columns:

Step 1

df = df.explode('mjtheme_namecode')

Step 2

df = df.join(pd.DataFrame(df['mjtheme_namecode'].values.tolist())

Added: if the dict has multiple hierarchies, you can try using json_normalize:

from pandas.io.json import json_normalize

df = df.join(json_normalize(df['mjtheme_namecode'].values.tolist())

The only issue here is pd.explode will duplicate all other columns (in case that is an issue).

Using sample data:

x = [
    [1,2,[{'a':1, 'b':3},{'a':2, 'b':4}]],
    [1,3,[{'a':5, 'b':6},{'a':7, 'b':8}]]
]

df = pd.DataFrame(x, columns=['col1','col2','col3'])

Out[1]:
    col1    col2    col3
0   1       2      [{'a': 1, 'b': 3}, {'a': 2, 'b': 4}]
1   1       3      [{'a': 5, 'b': 6}, {'a': 7, 'b': 8}]


## Step 1
df.explode('col3')
Out[2]:
    col1    col2    col3
0   1        2      {'a': 1, 'b': 3}
0   1        2      {'a': 2, 'b': 4}
1   1        3      {'a': 5, 'b': 6}
1   1        3      {'a': 7, 'b': 8}

## Step 2
df = df.join(pd.DataFrame(df['col3'].values.tolist()))
Out[3]:

    col1    col2    col3                a   b
0   1       2      {'a': 1, 'b': 3}     1   3
0   1       2      {'a': 2, 'b': 4}     1   3
1   1       3      {'a': 5, 'b': 6}     2   4
1   1       3      {'a': 7, 'b': 8}     2   4

## Now you can group with the new variables
realr
  • 3,652
  • 6
  • 23
  • 34