I'm trying to figure out the max (First_Word, Group)
pairs
import pandas as pd
df = pd.DataFrame({'First_Word': ['apple', 'apple', 'orange', 'apple', 'pear'],
'Group': ['apple bins', 'apple trees', 'orange juice', 'apple trees', 'pear tree'],
'Text': ['where to buy apple bins', 'i see an apple tree', 'i like orange juice',
'apple fell out of the tree', 'partrige in a pear tree']},
columns=['First_Word', 'Group', 'Text'])
First_Word Group Text
0 apple apple bins where to buy apple bins
1 apple apple trees i see an apple tree
2 orange orange juice i like orange juice
3 apple apple trees apple fell out of the tree
4 pear pear tree partrige in a pear tree
Then I do a groupby
:
grouped = df.groupby(['First_Word', 'Group']).count()
Text
First_Word Group
apple apple bins 1
apple trees 2
orange orange juice 1
pear pear tree 1
And I now want to filter it down to only unique index rows that have the max Text
counts. Below you'll notice apple bins
was removed because apple trees
has the max value.
Text
First_Word Group
apple apple trees 2
orange orange juice 1
pear pear tree 1
This max value of group question is similar but when I try something like this:
df.groupby(["First_Word", "Group"]).count().apply(lambda t: t[t['Text']==t['Text'].max()])
I get an error: KeyError: ('Text', 'occurred at index Text')
. If I add axis=1
to the apply
I get IndexError: ('index out of bounds', 'occurred at index (apple, apple bins)')