0

Let's say, I have number A and they call several people B

A   B

123 987
123 987
123 124
435 567
435 789
653 876
653 876
999 654
999 654
999 654
999 123

I want to find to whom the person in A has called maximum times and also the number of times.

OUTPUT:

A   B           Count
123 987         2
435 567 or789   1
653 876         2
999 654         3

How one can think of it is,

A      B
123    987 2
       124 1


435    567 1
       789 1

653    876 2

999    654 3
       123 1

Can somebody help me out on how to do this?

2 Answers2

2

Try this

# count the unique values in rows
df.value_counts(['A','B']).sort_index()
A    B  
123  124    1
     987    2
435  567    1
     789    1
653  876    2
999  123    1
     654    3
dtype: int64

To get the highest values for each unique A:

v = df.value_counts(['A','B'])
# remove duplicated rows
v[~v.reset_index(level=0).duplicated('A').values]
A    B  
999  654    3
123  987    2
653  876    2
435  567    1
dtype: int64
cottontail
  • 10,268
  • 18
  • 50
  • 51
1

Use SeriesGroupBy.value_counts which by default sorting values, so get first rows per A by GroupBy.head:

df = df.groupby('A')['B'].value_counts().groupby(level=0).head(1).reset_index(name='Count')
print (df)
     A    B  Count
0  123  987      2
1  435  567      1
2  653  876      2
3  999  654      3

Another idea:

df = df.value_counts(['A','B']).reset_index(name='Count').drop_duplicates('A')
print (df)
     A    B  Count
0  999  654      3
1  123  987      2
2  653  876      2
4  435  567      1
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252