Aggregate column values in pandas GroupBy as a dict

Question

This is the question I had during the interview in the past.

We have the input data having the following columns:

language, product id, shelf id, rank

For instance, the input would have the following format

English, 742005, 4560, 10.2 
English, 6000075389352, 4560, 49
French, 899883993, 4560, 32
French, 731317391, 7868, 81

we would like to do "group by" operation on language, shelf id columns and sort the list of products based on sort desc on "rank" attribute, which would result in the output having the following format:

Language, shelf_id, {product_id:rank1, product_id:rank2 ....}

for each record.

For the given input, the output would be the following:

English, 4560, {6000075389352:49, 742005:10.2}
French, 4560, 899883993:32
French, 7868, 731317391:81

I solved this problem by making a dictionary with the key (which is created by combining the language and shelf id) and inserting the product id, rank for each of the key.

My method worked, but it looks like there's an easier way of doing it using the python pandas library. I've read some references, but I'm still not sure if there's a superior method to what I've done (solving the problem by creating the key using language, shelf id and dictionary having that key)

Any help would be greatly appreciated.

cs95 · Accepted Answer · 2019-01-11T16:50:26.630

6

Setup

df = pd.read_csv('file.csv', header=None)  
df.columns = ['Lang', 'product_id', 'shelf_id', 'rank_id']    

df
      Lang     product_id  shelf_id  rank_id
0  English         742005      4560     10.2
1  English  6000075389352      4560     49.0
2   French      899883993      4560     32.0
3   French      731317391      7868     81.0

You can use df.groupby to group by Lang and shelf_id. Then use df.apply to get a dictionary of {productid : rankid}:

(df.groupby(['Lang', 'shelf_id'], as_index=False)
   .apply(lambda x: dict(zip(x['product_id'], x['rank_id'])))
   .reset_index(name='mapping'))

      Lang  shelf_id                              mapping
0  English      4560  {6000075389352: 49.0, 742005: 10.2}
1   French      4560                    {899883993: 32.0}
2   French      7868                    {731317391: 81.0}

edited Jan 11 '19 at 16:50

answered Jul 19 '17 at 23:55

cs95

379,657
97
704
746

Thanks for the answer, but could you also explain how you read the text input as data frame like that? – user98235 Jul 20 '17 at 00:07
@user98235 Edited my post with setup info. I assumed your data is in a csv file. – cs95 Jul 20 '17 at 00:11
@COLDSPEED thanks for the answer, but can you also tell me what if it's a file you just take as the input? For instance, I can just type it. – user98235 Jul 20 '17 at 00:13
@user98235 This link may be of use to you, if you don't want to read from a file: https://stackoverflow.com/a/22605281/4909087 – cs95 Jul 20 '17 at 00:14
I could not get this to work. Removing the as_index argument in groupby made it useable. – Jonathan Biemond Mar 10 '22 at 16:04

Aggregate column values in pandas GroupBy as a dict

1 Answers1

Linked