4

I have done a groupby which resulted in a dataframe similar to the below example.

df = pd.DataFrame({'a': ['A', 'A','A', 'B', 'B','B'], 'b': ['A1', 'A2','A3' ,'B1', 'B2','B3'], 'c': ['2','3','4','5','6','1'] })

>>> df
   a   b  c
0  A  A1  2
1  A  A2  3
2  A  A3  4
3  B  B1  5
4  B  B2  6
5  B  B3  1

desired output

>>> df
       a   b  c
    4  B  B2  6
    3  B  B1  5
    5  B  B3  1       
    2  A  A3  4
    1  A  A2  3
    0  A  A1  2 

As you can see it is a double ranking based on column a then column b. We first start with the highest which is B and within B we also start with the highest which is B2.

how i can do that in python please

SBad
  • 1,245
  • 5
  • 23
  • 36

3 Answers3

2

Use

In [1072]: df.sort_values(by=['a', 'c'], ascending=[False, False])
Out[1072]:
   a   b  c
4  B  B2  6
3  B  B1  5
5  B  B3  1
2  A  A3  4
1  A  A2  3
0  A  A1  2
Zero
  • 74,117
  • 18
  • 147
  • 154
  • Thank you for your help. "sort_values" does not work for me so I have changed it with df.sort(['a', 'b'], ascending=[False, False]) – SBad Mar 29 '18 at 11:18
2

you can first find maxima in each group and sort your DF descending by this local maxima and column c:

In [49]: (df.assign(x=df.groupby('a')['c'].transform('max'))
            .sort_values(['x','c'], ascending=[0,0])
            .drop('x',1))
Out[49]:
   a   b  c
4  B  B2  6
3  B  B1  5
5  B  B3  1
2  A  A3  4
1  A  A2  3
0  A  A1  2
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • Thank you MaxU. Whats the difference with previous answer? Are you just resetting the index? or doing something more than that? by the way "sort_values" do not work for me – SBad Mar 29 '18 at 11:20
  • thanks for the updated answer. Even "assign" doesnt work for me. my firm has an old python version (which I am not allowed to upgrade) – SBad Mar 29 '18 at 11:44
  • what is your __pandas__ version? – MaxU - stand with Ukraine Mar 29 '18 at 11:45
  • >>> pd.__version__ '0.14.1' – SBad Mar 29 '18 at 11:54
  • @SBad, there is no need to touch the system Python - [you can install Anaconda distributive locally, create virtual environments and use those VirtualEnv's in your projects.](https://uoa-eresearch.github.io/eresearch-cookbook/recipe/2014/11/20/conda/) This is how i'm using it in my company - I have several VirtualEnv's with different modules and/or different module versions so they are totally independent from the system Python (usually it's Python 2.4 on old RedHat machines) and indepenedent from each other... – MaxU - stand with Ukraine Mar 29 '18 at 11:55
2

I think need first get max values by aggregating, then create ordered Categorical by ordering by max indices and last sort_values working as you need:

c = df.groupby('a')['c'].max().sort_values(ascending=False)
print (c)
a
B    6
A    4
Name: c, dtype: object

df['a'] = pd.Categorical(df['a'], categories=c.index, ordered=True)
df = df.sort_values(by=['a', 'c'], ascending=[True, False])
print (df)
   a   b  c
4  B  B2  6
3  B  B1  5
5  B  B3  1
2  A  A3  4
1  A  A2  3
0  A  A1  2
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252