-1

I want to sort a dataframe in pandas. I wanna do it by sorting 2 columns by value counts. One depends on the other. As seen in the image, I have achieved categorical sorting. However, I want the column 'category' to be sorted by value counts. And then the dataframe is to be sorted again based on 'beneficiary_name' under the same category.

enter image description here

This is the code I have written to achieve this till now.

data_category = data_category.sort_values(by=['category','beneficiary_name'], ascending=False)

Please help me figure this out. Thanks.

saucypanda
  • 37
  • 1
  • 8
  • 1
    Could you please provide a dataframe in your question? See [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) for help – Stef Dec 08 '21 at 15:42
  • 2
    assign the counts with `df['counts'] = df.groupby('category')['data'].transform('size')`. Then you can sort on `counts` and `beneficiary_name`. – Quang Hoang Dec 08 '21 at 15:45
  • @QuangHoang This works! but how do I sort it ascending only for beneficiary_name. When I try ascending true, it does it for counts as well. – saucypanda Dec 08 '21 at 16:01
  • 1
    pass an array `ascending=[True, False]`: – Quang Hoang Dec 08 '21 at 16:02
  • thank you so much! Do you wanna post this ans? So I can accept it as the correct one. – saucypanda Dec 08 '21 at 16:04
  • @saucypanda Hello, my answer below was downvoted without explication, which leaves me to wonder whether I've misunderstood the question? Could you please clarify? – Stef Dec 15 '21 at 11:40

1 Answers1

-1

Inspired by this related question:

import pandas as pd

df = pd.DataFrame({'id': range(9), 'cat': list('ababaacdc'), 'benef': list('uuuuiiiii')})

print(df)
#    id cat benef
# 0   0   a     u
# 1   1   b     u
# 2   2   a     u
# 3   3   b     u
# 4   4   a     i
# 5   5   a     i
# 6   6   c     i
# 7   7   d     i
# 8   8   c     i

df['cat_count'] = df.groupby(['cat'])['id'].transform('count')

print(df)
#    id cat benef  cat_count
# 0   0   a     u          4
# 1   1   b     u          2
# 2   2   a     u          4
# 3   3   b     u          2
# 4   4   a     i          4
# 5   5   a     i          4
# 6   6   c     i          2
# 7   7   d     i          1
# 8   8   c     i          2

df = df.sort_values(by=['cat_count', 'cat', 'benef'], ascending=False)

print(df)
#    id cat benef  cat_count
# 0   0   a     u          4
# 2   2   a     u          4
# 4   4   a     i          4
# 5   5   a     i          4
# 6   6   c     i          2
# 8   8   c     i          2
# 1   1   b     u          2
# 3   3   b     u          2
# 7   7   d     i          1
Stef
  • 13,242
  • 2
  • 17
  • 28