I have a problem related to this question: Aggregate column values in pandas GroupBy as a dict
My input data has the following columns:
For instance, the input would have the following format
language, product_id, shelf_id, rank, fiction, pages
English, 742005, 4560, 10.2, 1.0, 456
English, 6000075389352, 4560, 49, 1.0, 234
French, 899883993, 4560, 32, 0.0, 125
French, 731317391, 7868, 81, 1.0, 576
French, 235678655, 7868, 12, 1.0, 235
I would like to do "groupby" on language & shelf_id columns and get the list of remaining attributes for each product_id. The expected output should have the following format:
Language, shelf_id, {product_id: [rank, fiction, pages]}
for each grouped record.
For the given input, my DESIRED OUTPUT would be the following:
language, shelf_id, mapping
English, 4560, {742005: [10.2, 1.0, 456], 6000075389352: [49, 1.0, 234]}
French, 4560, {899883993: [32, 0.0, 125]}
French, 7868, {731317391: [81, 1.0, 576], 235678655: [12, 1.0, 235]}
The solution provided in the above mentioned post, solves the problem nicely if only one column needs to be considered (in the resulting dictionary):
df = pd.read_csv('file.csv', header=None)
df.columns = ['Lang', 'product_id', 'shelf_id', 'rank_id']
(df.groupby(['Lang', 'shelf_id'], as_index=False)
.apply(lambda x: dict(zip(x['product_id'], x['rank_id'])))
.reset_index(name='mapping'))
which would yield:
Lang shelf_id mapping
0 English 4560 {742005: 10.2, 6000075389352: 49.0}
1 French 4560 {899883993: 32.0}
2 French 7868 {731317391: 81.0, 235678655: 12.0}
Could anyone help me adapt this solution to my case? Any suggestions would be appreciated.