how to get top n values in pandas dataframe if it has repeated values

Question

I have a pandas dataframe say:

x	y	z
1	a	x
1	b	y
1	c	z
2	a	x
2	b	x
3	a	y
4	a	z

If i wanted top 2 values by x, I mean top 2 values by x column which gives:

x	y	z
1	a	x
1	b	y
1	c	z
2	a	x
2	b	x

If i wanted top 2 values by y, I mean top 2 values by y column which gives:

x	y	z
1	a	x
1	b	y
2	a	x
2	b	x
3	a	y
4	a	z

How can I achieve this?

@mozway I don't think it's the right answer but maybe I'm wrong :-) — Corralien, Jan 20 '23 at 08:36
@Corralien maybe you're right, in any case OP should put more effort in describing the logic! — mozway, Jan 20 '23 at 08:39

score 1 · Answer 1 · answered Jan 20 '23 at 08:37

1

You can use:

>>> df[df['x'].isin(df['x'].value_counts().head(2).index)]
   x  y  z
0  1  a  x
1  1  b  y
2  1  c  z
3  2  a  x
4  2  b  x

>>> df[df['y'].isin(df['y'].value_counts().head(2).index)]
   x  y  z
0  1  a  x
1  1  b  y
3  2  a  x
4  2  b  x
5  3  a  y
6  4  a  z

answered Jan 20 '23 at 08:37

Corralien

109,409
8
28
52

hi, it works but it changes the order of the dataframe and doesn't give top 2 according to previous dataframe. How to maintain the same order and get top 2 values – Taylor Jan 21 '23 at 05:20
My output and your expected result are the same. What I'm missing? – Corralien Jan 21 '23 at 07:51

score 0 · Answer 2 · answered Jan 20 '23 at 09:16

0

def select_top_k(df, col, top_k):
    grouping_df = df.groupby(col)
    gr_list = list(grouping_df.groups)[:top_k]
    
    temp = grouping_df.filter(lambda x: x[col].iloc[0] in gr_list)
    return temp

data = {'x': [1, 1, 1, 2, 2, 3, 4],
        'y': ['a', 'b', 'c', 'a', 'b', 'a', 'a'],
        'z': ['x', 'y', 'z', 'x', 'x', 'y', 'z']}
df = pd.DataFrame(data)

col = 'x'
top_k = 2

select_top_k(df, col, top_k)

answered Jan 20 '23 at 09:16

Mazhar

1,044
6
11

Hi, it works but it shorts the dataframe i want top values in the same dataframe without sorting. – Taylor Jan 21 '23 at 05:26

how to get top n values in pandas dataframe if it has repeated values

2 Answers2