1

I have the following data frame that has been obtained by applying df.groupby(['category', 'unit_quantity']).count()

category unit_quantity Count
banana 1EA 5
eggs 100G 22
100ML 1
full cream milk 100G 5
100ML 1
1L 38

Let's call this latter dataframe as grouped. I want to find a way to regroup using columns unit_quantity and Count it and get

category unit_quantity Count Most Frequent unit_quantity
banana 1EA 5 1EA
eggs 100G 22 100G
100ML 1 100G
full cream milk 100G 5 1L
100ML 1 1L
1L 38 1L

Now, I tried to apply grouped.groupby(level=1).max() which gives me

unit_quantity
100G 22
100ML 1
1EA 5
1L 38

Now, because the indices of the latter and grouped do not coincide, I cannot join it using .merge. Does someone know how to solve this issue?

Thanks in advance

ℂybernetician
  • 135
  • 2
  • 10

1 Answers1

1

Starting from your DataFrame :

>>> import pandas as pd

>>> df = pd.DataFrame({'category': ['banana', 'eggs', 'eggs', 'full cream milk', 'full cream milk', 'full cream milk'], 
...                    'unit_quantity': ['1EA', '100G', '100ML', '100G', '100ML', '1L'], 
...                    'Count': [5, 22, 1, 5, 1, 38],}, 
...                   index = [0, 1, 2, 3, 4, 5]) 
>>> df
    category    unit_quantity   Count
0   banana                1EA       5
1   eggs                 100G      22
2   eggs                100ML       1
3   full cream milk      100G       5
4   full cream milk     100ML       1
5   full cream milk        1L      38

You can use the transform method applied on max of the column Count in order to keep your category and unit_quantity values :

>>> idx = df.groupby(['unit_quantity'])['Count'].transform(max) == df['Count']
>>> df[idx]
    category    unit_quantity   Count
0   banana                1EA       5
1   eggs                 100G      22
2   eggs                100ML       1
4   full cream milk     100ML       1
5   full cream milk        1L      38
tlentali
  • 3,407
  • 2
  • 14
  • 21