Group dataframe by multiple columns

Question

I want to group a Pandas Dataframe by multiple columns. Each Row has an integer, a Name, and an additional numerical value. I want the final Dataframe to include every row in which the Name has the highest integer.

values = {'Int': [1,1,1,2,2,1],
          'Name': ['Tom', 'Jim', 'Jan','Tom', 'Tom', 'Lucas'],
          'Bill':[0.5,0.2,0.2,0.7, 0.8, 0.2]}

df = pd.DataFrame.from_dict(values)

   Int   Name  Bill
0    1    Tom   0.5
1    1    Jim   0.2
2    1    Jan   0.2
3    2    Tom   0.7
4    2    Tom   0.8
5    1  Lucas   0.2

By grouping the dataframe only the 0th row should disappear. Row 3 and 4 should still be included in the dataframe.

In this specific case, since you want to return **all** rows that match, you should use `transform` to create a Boolean mask (Zelazny7's answer in the dup). The others, which use `sort` + `drop_duplicates` or `.idxmax` will return at most one row, which isn't what you want here. — ALollz, Apr 08 '19 at 15:49
df_grouped = df.groupby('Int').agg(lambda x: list(x)).max() Above code will return the value of max in Int column. Suggestion : Refer pandas's methods official guideline/page. — aman nagariya, Apr 08 '19 at 15:52

score 2 · Answer 1 · answered Apr 08 '19 at 15:46

2

IIUC, you can compare Int column with max Int for each Name, like this:

df[df['Int'] == df.groupby('Name')['Int'].transform('max')]

output:
    Int Name    Bill
1   1   Jim 0.2
2   1   Jan 0.2
3   2   Tom 0.7
4   2   Tom 0.8
5   1   Lucas   0.2

answered Apr 08 '19 at 15:46

Terry

2,761
2
14
28

Group dataframe by multiple columns

1 Answers1