0

I have a pandas dataframe say:

x y z
1 a x
1 b y
1 c z
2 a x
2 b x
3 a y
4 a z

If i wanted top 2 values by x, I mean top 2 values by x column which gives:

x y z
1 a x
1 b y
1 c z
2 a x
2 b x

If i wanted top 2 values by y, I mean top 2 values by y column which gives:

x y z
1 a x
1 b y
2 a x
2 b x
3 a y
4 a z

How can I achieve this?

Taylor
  • 1
  • 2

2 Answers2

1

You can use:

>>> df[df['x'].isin(df['x'].value_counts().head(2).index)]
   x  y  z
0  1  a  x
1  1  b  y
2  1  c  z
3  2  a  x
4  2  b  x

>>> df[df['y'].isin(df['y'].value_counts().head(2).index)]
   x  y  z
0  1  a  x
1  1  b  y
3  2  a  x
4  2  b  x
5  3  a  y
6  4  a  z
Corralien
  • 109,409
  • 8
  • 28
  • 52
  • hi, it works but it changes the order of the dataframe and doesn't give top 2 according to previous dataframe. How to maintain the same order and get top 2 values – Taylor Jan 21 '23 at 05:20
  • My output and your expected result are the same. What I'm missing? – Corralien Jan 21 '23 at 07:51
0
def select_top_k(df, col, top_k):
    grouping_df = df.groupby(col)
    gr_list = list(grouping_df.groups)[:top_k]
    
    temp = grouping_df.filter(lambda x: x[col].iloc[0] in gr_list)
    return temp
data = {'x': [1, 1, 1, 2, 2, 3, 4],
        'y': ['a', 'b', 'c', 'a', 'b', 'a', 'a'],
        'z': ['x', 'y', 'z', 'x', 'x', 'y', 'z']}
df = pd.DataFrame(data)

col = 'x'
top_k = 2

select_top_k(df, col, top_k)
Mazhar
  • 1,044
  • 6
  • 11
  • Hi, it works but it shorts the dataframe i want top values in the same dataframe without sorting. – Taylor Jan 21 '23 at 05:26