How to sort data in Dataframe by the certain condition?

Question

Given approximate Dataframe

df = pd.DataFrame({
    'Name': ['Max', 'Stefan', 'John', 'Kate', 'Walter', 'Karin', 'Julia', 'Ben', 'Spencer'],
    'Balance': [0.12, 0.03, 0.12, 0.12, 0.12, 0.03, 0.03, 0.06, 0.03]})

      Name  Balance
0      Max     0.12
1   Stefan     0.03
2     John     0.12
3     Kate     0.12
4   Walter     0.12
5    Karin     0.03
6    Julia     0.03
7      Ben     0.06
8  Spencer     0.03

The condition of the sorting is to create mini-groups by 3 rows. These rows would be: first(max value of the column), second(the value which is less than max val and bigger than min val), third(min value of the column). But the nuance is that this value(second) can be equal to max value or equal to min value or just max > average > min. Also, it's important to note that all values can be equal. So there are four different outcomes are possible.

I need to sort this Dataframe in such a way:

      Name  Balance
0      Max     0.12
1      Ben     0.06
2    Julia     0.03
3     Kate     0.12
4   Walter     0.12
5    Karin     0.03
6     John     0.12
7   Stefan     0.03
8  Spencer     0.03

That's not really sorting if you're changing the data. Or is it an mistake that the output has one of the 0.03's replaced with a 0.12? You can [edit] to fix it or clarify. — wjandrea, Mar 29 '23 at 00:28
BTW, welcome to Stack Overflow! Check out the [tour], and [How to ask a good question](/help/how-to-ask) if you want tips. You might also want to read [How to make good reproducible pandas examples](/q/20109391/4518341). — wjandrea, Mar 29 '23 at 00:28

score 1 · Accepted Answer · answered Mar 29 '23 at 01:18

You can sort by balance first, make an additional column for the order and use it to reorder the dataframe:

df.sort_values('Balance', ascending=False, inplace=True)

df['Order'] = sum((list(range(i, len(df), 3)) for i in range(3)), [])

df.sort_values('Order', inplace=True)

del df['Order']

Result:

      Name  Balance
0      Max     0.12
4   Walter     0.12
5    Karin     0.03
2     John     0.12
7      Ben     0.06
6    Julia     0.03
3     Kate     0.12
1   Stefan     0.03
8  Spencer     0.03

score 0 · Answer 2 · answered Mar 29 '23 at 01:04

There's too much nuance in your question to answer it properly, but this may give you something similar:

max_mask = df.Balance.eq(df.Balance.max())
min_mask = df.Balance.eq(df.Balance.min())
max_vals = df[max_mask]
min_vals = df[min_mask]
other = df[~max_mask & ~min_mask].sort_values(by='Balance', ascending=False)

output = pd.concat([
    out.reset_index(drop=True) for out in (max_vals, other, min_vals)
]).sort_index()

print(output)

Output:

      Name  Balance
0      Max     0.12
0      Ben     0.06
0   Stefan     0.03
1     John     0.12
1    Karin     0.03
2     Kate     0.12
2    Julia     0.03
3   Walter     0.12
3  Spencer     0.03

score 0 · Answer 3 · answered Mar 29 '23 at 02:36

This algorithm consider the first 3-rows group ([0, 1, 2]) then sorts the data inside, thus the second 3-rows group ([3, 4, 5]) and sorts the data, and so on... .

Ascending order is set to False.

Generalization is possible with N parameter.

Proposed script

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'Name': ['Max', 'Stefan', 'John', 'Kate', 'Walter', 'Karin', 'Julia', 'Ben', 'Spencer'],
    'Balance': [0.12, 0.03, 0.12, 0.12, 0.12, 0.03, 0.03, 0.06, 0.03]})

N=3
r = (df.groupby(np.arange(len(df.index))//N, axis=0)
       .apply(lambda g: g.sort_values(by='Balance', ascending=False))
       .reset_index(drop=True)
       )

print(r)

Result

      Name  Balance
0      Max     0.12
1     John     0.12
2   Stefan     0.03
3     Kate     0.12
4   Walter     0.12
5    Karin     0.03
6      Ben     0.06
7    Julia     0.03
8  Spencer     0.03

ziying35 · Answer 4 · 2023-03-29T08:41:08.130

try this:

df = pd.DataFrame({
    'Name': ['Max', 'Stefan', 'John', 'Kate', 'Walter', 'Karin', 'Julia', 'Ben', 'Spencer'],
    'Balance': [0.12, 0.03, 0.12, 0.12, 0.12, 0.03, 0.03, 0.06, 0.03]})

groups = []
while not df.empty:
    max_idx, min_idx = df['Balance'].idxmax(), df['Balance'].idxmin()
    if (between_min_max := df['Balance'].between(
        *df['Balance'].loc[min_idx, max_idx], inclusive='neither')).any():

        mid_idx = between_min_max.idxmax()

    else:
        mid_idx = df['Balance'].drop(index=[max_idx, min_idx]).idxmax()
    
    group = df.loc[[max_idx, mid_idx, min_idx]]    
    data.append(group)    
    df.drop(index=group.index, inplace=True)

result = pd.concat(groups)
print(result)
>>>
      Name  Balance
0      Max     0.12
7      Ben     0.06
1   Stefan     0.03
2     John     0.12
3     Kate     0.12
5    Karin     0.03
4   Walter     0.12
8  Spencer     0.03
6    Julia     0.03

How to sort data in Dataframe by the certain condition?

4 Answers4