This is the question I had during the interview in the past.
We have the input data having the following columns:
language, product id, shelf id, rank
For instance, the input would have the following format
English, 742005, 4560, 10.2
English, 6000075389352, 4560, 49
French, 899883993, 4560, 32
French, 731317391, 7868, 81
we would like to do "group by" operation on language, shelf id columns and sort the list of products based on sort desc on "rank" attribute, which would result in the output having the following format:
Language, shelf_id, {product_id:rank1, product_id:rank2 ....}
for each record.
For the given input, the output would be the following:
English, 4560, {6000075389352:49, 742005:10.2}
French, 4560, 899883993:32
French, 7868, 731317391:81
I solved this problem by making a dictionary with the key (which is created by combining the language and shelf id) and inserting the product id, rank for each of the key.
My method worked, but it looks like there's an easier way of doing it using the python pandas library. I've read some references, but I'm still not sure if there's a superior method to what I've done (solving the problem by creating the key using language, shelf id and dictionary having that key)
Any help would be greatly appreciated.