Find most frequent observation in group

Question

DataFrame:

B = pd.DataFrame({'b':['II','II','II','II','II','I','I','I'],
                  'MOST_FREQUENT':['1', '2', '2', '1', '1','1','2','2']})

I need to get the most frequent value in a column MOST_FREQUENT for each group:

pd.DataFrame({'b':['I','II'],
                      'MOST_FREQUENT':['2','1']})

The only clue i found - mode(), but is not applieble to DataFrameGroupBy

EDIT: I need a solution, which satisfies the pandas' .agg() function

jezrael · Accepted Answer · 2017-04-20T16:32:59.803

2

You can use apply:

print (B.groupby('b')['MOST_FREQUENT'].apply(lambda x: x.mode())
        .reset_index(level=1, drop=True).reset_index())
    b MOST_FREQUENT
0   I             2
1  II             1

Another solution is use SeriesGroupBy.value_counts and return first index value, because value_counts sorts values:

print (B.groupby('b')['MOST_FREQUENT'].apply(lambda x: x.value_counts().index[0])
        .reset_index())
    b MOST_FREQUENT
0   I             2
1  II             1

EDIT: You can use most_common

from collections import Counter
print (B.groupby(['b']).agg(lambda x: Counter(x).most_common(1)[0][0]).reset_index())
    b MOST_FREQUENT
0   I             2
1  II             1

edited Apr 20 '17 at 16:32

answered Apr 20 '17 at 15:17

jezrael

822,522
95
1,334
1,252

How can i make your `lambda` work in an `.agg()` function? – Ladenkov Vladislav Apr 20 '17 at 15:54
You can try `print (B.groupby('b').agg(lambda x: x.value_counts().index[0]) ))` – jezrael Apr 20 '17 at 16:10
But now I am offline, only on phone, so cannot test. – jezrael Apr 20 '17 at 16:11
Unfortunatly, still not. It says: `IndexError: index 0 is out of bounds for axis 0 with size 0` – Ladenkov Vladislav Apr 20 '17 at 16:17
Ceck edited answer. – jezrael Apr 20 '17 at 16:34

score 2 · Answer 2 · answered Apr 20 '17 at 15:39

2

Trying to squeeze a little more performance out of pandas, we can use groupby with size to get the counts. then use idxmax to find the index values of the largest sub-groups. These indices will be the values we're looking for.

s = B.groupby(['MOST_FREQUENT', 'b']).size()
pd.DataFrame(
    s.groupby(level='b').idxmax().values.tolist(),
    columns=s.index.names
)

  MOST_FREQUENT   b
0             2   I
1             1  II

naive timing

answered Apr 20 '17 at 15:39

piRSquared

285,575
57
475
624

How can i use it in `.agg()` function? Why do you group by both columns? – Ladenkov Vladislav Apr 20 '17 at 15:57

Find most frequent observation in group

2 Answers2

Linked

Related