How best to create a new column where each value is determined by the current row as well as other rows?

Question

Here's just a sample dataframe:

df = pd.DataFrame([[0, 234, 1000], [1, 324, 1015], [2, 343, 1045]], columns = ["num", "num1", "num2"])

    num num1 num2
0   0   234  1000
1   1   324  1015
2   2   343  1045

I would like to create a fourth column that contains the current value for the num1 column, and the two previous values for num1 but only if those values are larger than 300.

I tried this answer to some extent: Apply function to pandas dataframe row using values in other rows

However, I'm not sure how to make it conditional on whether the two previous rows are greater than a certain number.

When in doubt, make a "cheater" or "helper" column that is just the instances of num1 where its value is greater than 300, else 0. I know some folks here will pass out at the suggestion of a seemingly unnecessary column, but if your df is not gigantic and the code & intent are clear, it should be straightforward. — AirSquid, Feb 06 '20 at 04:13

score 0 · Answer 1 · answered Feb 06 '20 at 08:08

I could do it for next 2 values for each row, not previous 2 values, may be someone else can figure that out.

df['val'] = df.apply(lambda x: [val for i,val in enumerate(df['num1'][x.name:x.name+3].to_list()) if val >=300 or i==0], axis=1)

print(df)

Output

   num  num1    num2    val
0   0   123     1000    [123, 324]
1   0   234     1000    [234, 324, 343]
2   1   324     1015    [324, 343]
3   2   343     1045    [343]
4   0   123     1000    [123]

How best to create a new column where each value is determined by the current row as well as other rows?

1 Answers1