double ranking in python after a groupby

Question

I have done a groupby which resulted in a dataframe similar to the below example.

df = pd.DataFrame({'a': ['A', 'A','A', 'B', 'B','B'], 'b': ['A1', 'A2','A3' ,'B1', 'B2','B3'], 'c': ['2','3','4','5','6','1'] })

>>> df
   a   b  c
0  A  A1  2
1  A  A2  3
2  A  A3  4
3  B  B1  5
4  B  B2  6
5  B  B3  1

desired output

>>> df
       a   b  c
    4  B  B2  6
    3  B  B1  5
    5  B  B3  1       
    2  A  A3  4
    1  A  A2  3
    0  A  A1  2

As you can see it is a double ranking based on column a then column b. We first start with the highest which is B and within B we also start with the highest which is B2.

how i can do that in python please

Whats happen if `2 A A3 4` is changed to `2 A A3 7` ? – jezrael Mar 29 '18 at 11:11 — jezrael, Mar 29 '18 at 11:11

score 2 · Answer 1 · answered Mar 29 '18 at 11:05

2

Use

In [1072]: df.sort_values(by=['a', 'c'], ascending=[False, False])
Out[1072]:
   a   b  c
4  B  B2  6
3  B  B1  5
5  B  B3  1
2  A  A3  4
1  A  A2  3
0  A  A1  2

answered Mar 29 '18 at 11:05

Zero

74,117
18
147
154

Thank you for your help. "sort_values" does not work for me so I have changed it with df.sort(['a', 'b'], ascending=[False, False]) – SBad Mar 29 '18 at 11:18

MaxU - stand with Ukraine · Accepted Answer · 2018-03-29T11:20:56.407

2

you can first find maxima in each group and sort your DF descending by this local maxima and column c:

In [49]: (df.assign(x=df.groupby('a')['c'].transform('max'))
            .sort_values(['x','c'], ascending=[0,0])
            .drop('x',1))
Out[49]:
   a   b  c
4  B  B2  6
3  B  B1  5
5  B  B3  1
2  A  A3  4
1  A  A2  3
0  A  A1  2

edited Mar 29 '18 at 11:20

answered Mar 29 '18 at 11:11

MaxU - stand with Ukraine

205,989
36
386
419

Thank you MaxU. Whats the difference with previous answer? Are you just resetting the index? or doing something more than that? by the way "sort_values" do not work for me – SBad Mar 29 '18 at 11:20
thanks for the updated answer. Even "assign" doesnt work for me. my firm has an old python version (which I am not allowed to upgrade) – SBad Mar 29 '18 at 11:44
what is your __pandas__ version? – MaxU - stand with Ukraine Mar 29 '18 at 11:45
>>> pd.__version__ '0.14.1' – SBad Mar 29 '18 at 11:54
@SBad, there is no need to touch the system Python - [you can install Anaconda distributive locally, create virtual environments and use those VirtualEnv's in your projects.](https://uoa-eresearch.github.io/eresearch-cookbook/recipe/2014/11/20/conda/) This is how i'm using it in my company - I have several VirtualEnv's with different modules and/or different module versions so they are totally independent from the system Python (usually it's Python 2.4 on old RedHat machines) and indepenedent from each other... – MaxU - stand with Ukraine Mar 29 '18 at 11:55

jezrael · Answer 3 · 2018-03-29T11:26:56.690

I think need first get max values by aggregating, then create ordered Categorical by ordering by max indices and last sort_values working as you need:

c = df.groupby('a')['c'].max().sort_values(ascending=False)
print (c)
a
B    6
A    4
Name: c, dtype: object

df['a'] = pd.Categorical(df['a'], categories=c.index, ordered=True)
df = df.sort_values(by=['a', 'c'], ascending=[True, False])
print (df)
   a   b  c
4  B  B2  6
3  B  B1  5
5  B  B3  1
2  A  A3  4
1  A  A2  3
0  A  A1  2

double ranking in python after a groupby

3 Answers3