3

given this Dataframe :

import pandas as pd
import numpy as np

data = {'column1': [True,False, False, True, True],
        'column2' : [np.nan,0.21, np.nan, 0.2222, np.nan],
        'column3': [1000, 0, 0, 0, 0 ]}


df = pd.DataFrame.from_dict(data)

print(df)

   column1  column2  column3
0     True      NaN     1000
1    False   0.2100        0
2    False      NaN        0
3     True   0.2222        0
4     True      NaN        0

How can I multiply the result from column2 with the previous value of column3 when the column2 row isn't a NaN otherwise just return the previous value of column3 ?

The results should be something like this :

   column1  column2  column3
0     True      NaN     1000
1    False   0.2100        210
2    False      NaN        210
3     True   0.2222        46.662
4     True      NaN        46.662

I've been browsing through similar questions but I just can't get my head around it ..

I'd appreciate your input :)

Finger twist
  • 3,546
  • 9
  • 42
  • 52
  • https://stackoverflow.com/questions/23330654/update-a-dataframe-in-pandas-while-iterating-row-by-row – Hielke Walinga Aug 21 '19 at 09:38
  • Could you post what you've tried and perhaps what hasn't worked? Do you understand (or prefer) list comprehensions? And does column1 have any effect on the output? – rajah9 Aug 21 '19 at 09:39
  • @rajah9 Column1 has no effect on the output, I've been trying to create Column3 using np.where(np.isnan('column2'), etc etc . But I'm getting nowhere, not sure it's the right strategy . – Finger twist Aug 21 '19 at 09:59

4 Answers4

2

You can give this a try:

#replace 0 with nan and create a copy of the df
m=df.assign(column3=df.column3.replace(0,np.nan))
#ffill on axis 1 where column2 is not null , and filter the last col then cumprod
final=(df.assign(column3=m.mask(m.column2.notna(),m.ffill(1)).iloc[:,-1].cumprod().ffill()))

   column1  column2   column3
0     True      NaN  1000.000
1    False   0.2100   210.000
2    False      NaN   210.000
3     True   0.2222    46.662
4     True      NaN    46.662
anky
  • 74,114
  • 11
  • 41
  • 70
1

Use isnull() and .at

Ex.

import pandas as pd
import numpy as np

data = {'column1': [True,False, False, True, True],
        'column2' : [np.nan,0.21, np.nan, 0.2222, np.nan],
        'column3': [1000, 0, 0, 0, 0 ]}
df = pd.DataFrame.from_dict(data)
pre_idx = 0
for idx in df.index:
    is_nan = pd.isnull(df['column2'][idx])
    if idx != 0:
        pre_idx = idx -1
    df.at[idx, 'column3'] = df.at[pre_idx, 'column3'] * (1 if is_nan else df.at[idx, 'column2'])
print(df)

O/P:

   column1  column2  column3
0     True      NaN     1000
1    False   0.2100      210
2    False      NaN      210
3     True   0.2222       46
4     True      NaN       46
bharatk
  • 4,202
  • 5
  • 16
  • 30
1

I would define a dummy class to accumulate the last value of column3 and then iterate over rows to do the computation. If you do it this way, you avoid writing a for loop and you concentrate the computation in a map call, which can be for example run in parallel easily

class AccumMult:
    def __init__(self):
        self.last_val = None

    def mult(self, c2, c3):
        self.last_val = c3 if self.last_val is None else self.last_val
        if not np.isnan(c2):
            self.last_val = self.last_val * c2
        return self.last_val

m = AccumMult()

df["column3"] = list(map(lambda x: m.mult(x[0], x[1]), df[["column2", "column3"]].values.tolist()))
ivallesp
  • 2,018
  • 1
  • 14
  • 21
0

As the value in row x depends on information in all rows before it, I guess you have no choice and need to iterate over the rows. You could do

prev = df.at[0, 'column3']
for e, row in df.iterrows():
    prev = df.at[e, 'column3'] = prev * (1 if np.isnan(row.column2) else row.column2)
ilmiacs
  • 2,566
  • 15
  • 20