Get max value rows then min value rows, or if a subset has all Nan, keep everything

Question

I have a pandas dataframe

x = pd.DataFrame({
    "t": ["A", "A", "A", "B", "B", "B", "B", "A", "C", "C"],
    "m": ["k", "m", "m", "k", "m", "m", "b", "k", "f", "d"],
    "f1": [1.2, np.nan,  0.8, 1, 1, 1.5, 1, np.nan, np.nan, np.nan],
    "f2": [100, 200, 200, 100, 100, 100, 50, 200, 300, 400]})

   t  m   f1   f2
0  A  k  1.2  100
1  A  m  NaN  200
2  A  m  0.8  200
3  B  k  1.0  100
4  B  m  1.0  100
5  B  m  1.5  100
6  B  b  1.0   50
7  A  k  NaN  200
8  C  f  NaN  300
9  C  d  NaN  400

On f1 I want to pick min value of each subgroup t. If a subgroup has Nan values, we need to pick all the values. On f2 I want to pick max values. In case of Nan values, we need to keep all the rows. So I want an output something similar to

   t  m   f1   f2
0  A  m  0.8  200
1  B  k  1.0  100
2  B  m  1.0  100
3  C  d  NaN  400

I was able to achieve this using

def keep_rows(k, col, op):
    # if all values are Nan 
    if np.isnan(k[col].values).all():
        return k 
    return k[k[col] == getattr(np, f"nan{op}")(k[col])]

tt = x.groupby("t", as_index=False).apply(lambda x: keep_rows(x, "f1", "min")).reset_index(drop=True)
tt = tt.groupby("t", as_index=False).apply(lambda x: keep_rows(x, "f2", "max")).reset_index(drop=True)
tt

But is there a better way? Especially in pandas v2?

Then please explain the logic and how this is not a duplicate — mozway, May 31 '23 at 03:49
firstly this is on multiple columns and need different operations on both of them and there can be multiple eligible rows. need a subset of it. I found the other question to be verbose and most of the answers are wrong. Also want to see if pandas v2 has any elegant solution for this. — Prakash Vanapalli, May 31 '23 at 04:59
This is not very clear in your question, nevertheless see the answer below if you need simultaneous conditions. If you want **successive** conditions, then this falls back to the duplicate. — mozway, May 31 '23 at 05:23
Warm welcome to SO. Please try to use correct upper case letters, e.g. in the beginning of your title, sentences or the word "I". This would be gentle to your readers. Please read [How to ask](https://stackoverflow.com/help/how-to-ask) and [Minimal Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example). Then update your question with code to show us what you have tried so far. — buhtz, May 31 '23 at 10:42
@Prakash tbf it was a duplicate originally, but now it's totally different yeah — wjandrea, May 31 '23 at 16:22

score 1 · Answer 1 · edited May 31 '23 at 16:21

1

If the operations are simultaneous, you need a double groupby.transform:

updated question

g = x.assign(flag=x['f1'].isna()).groupby('t')

# does the row have the min f1 or its group?
m1 = g['f1'].transform('min').eq(x['f1'])
# does the group have all NaN in f1?
m2 = g['flag'].transform('all')
# does the row have the max f2 or its group?
m3 = g['f2'].transform('max').eq(x['f2'])

out = x.loc[(m1 | m2) & m3]

Output:

   t  m   f1   f2
2  A  m  0.8  200
3  B  k  1.0  100
4  B  m  1.0  100
9  C  d  NaN  400

original question

g = x.groupby('t')

m1 = g['f1'].transform('min').eq(x['f1'])
m2 = g['f2'].transform('max').eq(x['f2'])

out = x.loc[m1 & m2]

If the operations are consecutive, then you fall back to a known problem, just chain two selections.

Output:

   t  m   f1   f2
1  A  m  0.8  200
2  B  k  1.0  100
3  B  m  1.0  100

edited May 31 '23 at 16:21

wjandrea

28,235
9
60
81

answered May 31 '23 at 05:18

mozway

194,879
13
39
75

thanks, what if `f1` contains np.nan and we want to include those rows. when using `transform(min)`, those rows are removed. – Prakash Vanapalli May 31 '23 at 05:29
Then add a third condition: `m3 = x['f1'].isna()` and `out = x.loc[(m1&m2)|m3]`. Depending on your exact logic. – mozway May 31 '23 at 05:40
I have further edited this question. Please check once. Thanks – Prakash Vanapalli May 31 '23 at 06:19

Get max value rows then min value rows, or if a subset has all Nan, keep everything

1 Answers1

updated question

original question