2

I have the following DataFrame:

import pandas as pd
df = pd.DataFrame({
    'col1':['A',pd.NA,pd.NA,pd.NA,pd.NA, 'B', pd.NA, pd.NA],
    'col2':[9.5, 6,24,8, 30, 7, 6, 8],
    })
print(df)

Giving:

   col1  col2
0     A   9.5
1  <NA>   6.0
2  <NA>  24.0
3  <NA>   8.0
4  <NA>  30.0
5     B   7.0
6  <NA>   6.0
7  <NA>   8.0

What I'd like to achieve is to forward fill col1, but not in rows where the value in col2 is greater than the col2 value of the row I'm forward filling from. Like so:

   col1  col2
0     A   9.5
1     A   6.0
2  <NA>  24.0
3     A   8.0
4  <NA>  30.0
5     B   7.0
6     B   6.0
7  <NA>   8.0

Here, index 1 and 3 are forward filled, but index 2 and 4 are not, as the col2 values (24, 30) are greater than the initial row's col2 value (9.5)

I can achieve the desired result by iterating through the df, like so:

val2 = -1
for i, r in df.iterrows():
    if not pd.isnull(r['col1']):
        val1 = r['col1']
        val2 = r['col2']
    else:
        if r['col2']<val2:
            df.loc[i, 'col1'] = val1
print(df)

Is there a way to achieve this without using iterrows()?

McBuch
  • 43
  • 6

1 Answers1

3

You can forward filling missing values to helper Series s and then compare groups by this Series - first value by all values and if values are not greater replace values in numpy.where:

s = df['col1'].ffill()

df['col1'] = np.where(df.groupby(s)['col2'].transform('first').gt(df['col2']),s,df['col1'])
print (df)
   col1  col2
0     A   9.5
1     A   6.0
2  <NA>  24.0
3     A   8.0
4  <NA>  30.0
5     B   7.0
6     B   6.0
7  <NA>   8.0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    I had almost the exact same approach with pandas' `where`, but you were just too fast ;) – mozway Feb 10 '22 at 08:35
  • 1
    Do you get a hidden "question-preview" bonus at 500k+ rep or you just drink 20 coffees a day? or robotic arm? brain keyboard interface? :p – mozway Feb 10 '22 at 08:38
  • @mozway - no, really busy, no time for answering, so only some question – jezrael Feb 10 '22 at 08:39
  • I was more talking about the speed ;) Took me 2 min to read, copy data and start typing answer and you already had posted! – mozway Feb 10 '22 at 08:40
  • 1
    Excellent solution and impressive response time, @jezrael – McBuch Feb 10 '22 at 08:54