0

I am trying to create a column where if the previous value in the same column meets a condition then that value must become the previous value. For example...

PREVIOUS col | CORRECT col
HW              HW
HW              HW
RF              RF
FW              FW
FW              FW
FWRF            FW
FW              FW 

As you can see 'FWRF' has changed to 'FW' if the PREVIOUS col value is == 'FW'. I am trying to get the [CORRECT col] to work but I have run into issues.

Heres what I have tried so far to no avail...

df['CORRECT col'] = df.loc[df['PREVIOUS col'].shift(-1) == 'FWRF', 'PREVIOUS col']

I have also tried to create 'if' statements for this to work and that was even worse... but I am still learning how to use pandas effectively as I can. Your help will be greatly appreciated.

  • Does this answer your question? [Creating New Column based on condition on Other Column in Pandas DataFrame](https://stackoverflow.com/questions/59642338/creating-new-column-based-on-condition-on-other-column-in-pandas-dataframe) – itprorh66 Nov 04 '21 at 20:30

1 Answers1

2

There are 2 possibilities based on your description:

Either

  1. you only need to check previous value is FW to change current value also to FW,

or

  1. you need also to check that the current value is FWRF together with previous value is FW to change current value to FW

For case 1, you can use .mask() and .shift(), as follows:

df['CORRECT col'] = df['PREVIOUS col'].mask(df['PREVIOUS col'].shift() == 'FW', 'FW')

Here, we get the previous row value of PREVIOUS col using .shift() and check whether it is FW, if yes, we change the current row value also to FW using .mask() (If the condition is False, we keeps the original value).

Result:

print(df)

  PREVIOUS col CORRECT col
0           HW          HW
1           HW          HW
2           RF          RF
3           FW          FW
4           FW          FW
5         FWRF          FW
6           FW          FW

For case 2 where you also want to check the current row value is FWRF, you can change slightly to add a checking, as follows:

df['CORRECT col'] = df['PREVIOUS col'].mask((df['PREVIOUS col'] == 'FWRF') & (df['PREVIOUS col'].shift() == 'FW'), 'FW')

Here, we added the checking (df['PREVIOUS col'] == 'FWRF') and use & (and) with the condition (df['PREVIOUS col'].shift() == 'FW') so that only when BOTH conditions are true we change the value.

Same result for the sample data but see whether your real data require checking one condition or both conditions to work best.

SeaBean
  • 22,547
  • 3
  • 13
  • 25
  • 1
    Thank you so much. I have never heard of the mask() method before and it's exactly the thing I have been missing. The second option is the correct answer, I am sorry for not being clear enough in my question. – Michael Ray Nov 05 '21 at 04:46
  • If you wanted to check the 'next' value in the column for a condition then would you use df['PREVIOUS col'].shift(-1)? – Michael Ray Nov 05 '21 at 04:47
  • 1
    @MichaelRay That's right, use `.shift(-1)` to check for 'next' value. The default, without supplying a number parameter, is `.shift(1)` which is to get the 'previous' value. Negative number -n for 'next n-th' value. – SeaBean Nov 05 '21 at 06:05