Python Pandas: How to split a sorted dictionary in a column of a dataframe

Question

I have a dataFrame like this:

id  asn      orgs
0   3320    {'Deutsche Telekom AG': 2288}
1   47886   {'Joyent': 16, 'Equinix (Netherlands) B.V.': 7}
2   47601   {'fusion services': 1024, 'GCE Global Maritime':16859}  
3   33438   {'Highwinds Network Group': 893}

I would like to sort the 'orgs' column which is actually a dictionary and then extract get the pair(k,v) with the highest values in two different columns. Like this:

id  asn      org                      value
0   3320    'Deutsche Telekom AG'     2288
1   47886   'Joyent'                  16
2   47601   'GCE Global Maritime'     16859 
3   33438   'Highwinds Network Group' 893

Currently I am running this code but it does not properly sort, and then I am not sure how to extract the pair with highest value.

df.orgs.apply(lambda x : sorted(x.items(),key=operator.itemgetter(1),reverse=True))

which gave me a list like this:

id  asn      orgs
0   3320    [('Deutsche Telekom AG', 2288)]
1   47886   [('Joyent', 16),( 'Equinix (Netherlands) B.V.', 7)]
2   47601   [('GCE Global Maritime',16859),('fusion services', 1024)]   
3   33438   [('Highwinds Network Group', 893)]

Now how can I put the key and the value of the highest into two seperate columns? Can anybody help?

Well what you're asking for is just the max value, the sorting is a bit irrelevant no? — EdChum, Apr 20 '15 at 08:49
@EdChum no because I would like to have both the key and the value in separate columns of the pair with maximum value. — UserYmY, Apr 20 '15 at 08:50

score 2 · Accepted Answer · edited May 23 '17 at 11:45

Another approach define a function that just calls min on the dict and return a Series so you can assign to multiple columns (function body taken from @Alex Martelli's answer):

In [17]:

def func(x):
    k = min(x, key=x.get)
    return pd.Series([k, x[k]])
df[['orgs', 'value']] = df['orgs'].apply(func)
df

Out[17]:
     asn  id                        orgs  value
0   3320   0         Deutsche Telekom AG   2288
1  47886   1  Equinix (Netherlands) B.V.      7
2  47601   2             fusion services   1024
3  33438   3     Highwinds Network Group    893

EDIT

If your data has empty dicss, then you can just test the len:

In [34]:

df = pd.DataFrame({'id':[0,1,2,3,4],
                   'asn':[3320,47886,47601,33438,56],
                   'orgs':[{'Deutsche Telekom AG': 2288},
                           {'Joyent': 16, 'Equinix (Netherlands) B.V.': 7},
                           {'fusion services': 1024, 'GCE Global Maritime':16859},
                           {'Highwinds Network Group': 893},{}]})
df
Out[34]:
     asn  id                                               orgs
0   3320   0                      {'Deutsche Telekom AG': 2288}
1  47886   1    {'Equinix (Netherlands) B.V.': 7, 'Joyent': 16}
2  47601   2  {'GCE Global Maritime': 16859, 'fusion service...
3  33438   3                   {'Highwinds Network Group': 893}
4     56   4                                                 {}
In [36]:

def func(x):
    if len(x) > 0:
        k = min(x, key=x.get)
        return pd.Series([k, x[k]])
    return pd.Series([np.NaN, np.NaN])

df[['orgs', 'value']] = df['orgs'].apply(func)
df

Out[36]:
     asn  id                        orgs  value
0   3320   0         Deutsche Telekom AG   2288
1  47886   1  Equinix (Netherlands) B.V.      7
2  47601   2             fusion services   1024
3  33438   3     Highwinds Network Group    893
4     56   4                         NaN    NaN

Thanks EdChum. I get this error:ValueError: min() arg is an empty sequence, I am guessing because I also have some empty cells. How can I modify it for this exception? — UserYmY, Apr 20 '15 at 09:14
You could test if the value is empty or wrap a try catch, I'll update my answer — EdChum, Apr 20 '15 at 09:19
Thanks, but It is not an empty string it is an empty dictionary, I am still getting the error. — UserYmY, Apr 20 '15 at 09:29
OK, updated my answer, it'd be useful to fully explain what you mean by empty in the future — EdChum, Apr 20 '15 at 09:32
Thanks for the modifications, only a tiny remark, since I wanted the maximum, i change the min in your answer to max (asked in the original question) — UserYmY, Apr 20 '15 at 09:36

dting · Answer 2 · 2015-04-20T09:36:18.050

This should work:

In [1]: import pandas as pd  
In [2]: import operator
In [3]: df = pd.DataFrame({ 'id' : [0,1,2,3],
   ...:                      'asn' : [3320, 47886, 47601, 33438],
   ...:                      'orgs' : [{'Deutsche Telekom AG': 2288}, {'Joyent': 16, 'Equinix (Netherlands) B.V.': 7}, {'fusion services': 1024, 'GCE Global Maritime':16859}, {'Highwinds Network Group': 893}]
   ...:                    })

In [4]: df.orgs, df['value'] = zip(*df.orgs.apply(lambda x : sorted(x.items(),key=operator.itemgetter(1),reverse=True)[0]))

In [5]: df
Out[5]:
     asn  id                     orgs  value
0   3320   0      Deutsche Telekom AG   2288
1  47886   1                   Joyent     16
2  47601   2      GCE Global Maritime  16859
3  33438   3  Highwinds Network Group    893

I used zip(* <first element of sorted dict items>) and assigned them to df.orgs and df.value.

For empty dictionaries:

In [3]: df = pd.DataFrame({ 'id' : [0,1,2,3],
   ...:                      'asn' : [3320, 47886, 47601, 33438],
   ...:                      'orgs' : [{'Deutsche Telekom AG': 2288}, {'Joyent': 16, 'Equinix (Netherlands) B.V.': 7}, {'fusion services': 1024, 'GCE Global Maritime':16859}, {}]
   ...:                    })
In [4]: df.orgs.apply(lambda x : sorted(x.items(),key=operator.itemgetter(1),reverse=True)[0] if len(x) else ('',''))
Out[4]:
0     (Deutsche Telekom AG, 2288)
1                    (Joyent, 16)
2    (GCE Global Maritime, 16859)
3                            (, )
Name: orgs, dtype: object

In [5]: df.orgs, df['value'] = zip(*df.orgs.apply(lambda x : sorted(x.items(),key=operator.itemgetter(1),reverse=True)[0] if len(x) else ('','')))

In [6]: df
Out[6]:
     asn  id                 orgs  value
0   3320   0  Deutsche Telekom AG   2288
1  47886   1               Joyent     16
2  47601   2  GCE Global Maritime  16859
3  33438   3

I have the same problem here, what should I do with the orgs that have an empty dictionary? — UserYmY, Apr 20 '15 at 09:23

Python Pandas: How to split a sorted dictionary in a column of a dataframe

2 Answers2

Linked