row(n-1)*rate + row(n) in a pandas dataframe

Question

I have a dataframe and a series of rates. My calculation is very simple:

new_row(n) = new_row(n-1)*rate + old_row(n)

I have 20 Columns in my dataframe. rate is a series of 20 (1 for each column). I have written a code using loops which take nearly 9 seconds to run. I believe that, it is not the ideal way of doing this exercise. I would like to find a Pythonic way of doing it.

data = pd.read_csv('data.csv')
ret_rate = pd.read_csv('Retention_Rate.csv')

ret_dat = data.copy()
for i in range(4, ret_dat.shape[1]):
    for j in range(1, ret_dat.shape[0]):
        if (ret_dat['MARKET_ID'][j] == ret_dat['MARKET_ID'][j-1]):
        ret_dat.iloc[j, i] = ret_dat.iloc[j, i] + ret_rate.iloc[i-4,0]*ret_dat.iloc[j-1, i]

ret_dat.to_csv('adstock_data_v3.csv')

I have put the data in a Google sheet.

thank you for this question. It helped me with a similar problem — Julien__, Jul 05 '20 at 16:03

OPhamster · Answer 1 · 2017-02-21T07:24:45.447

0

I'm not sure if this is what you were trying to achieve - but this is simpler and a direct translation of the formula u have given

data = pd.read_csv('data.csv')
ret_rate = pd.read_csv('Retention_Rate.csv',usecols=['rate'])
# since you require only the 5th column onwards
ret_data = data.ix[:,4:]

# you can apply broad operations over rows instead of individual cells now
for i in range(1,len(ret_data)):
    ret_data.iloc[i] = ret_data.iloc[i-1].multiply(ret_rate.rate.iloc[i-1]) + ret_dat.iloc[i]

ret_data.to_csv('your_filename.csv',sep=',')

edited Feb 21 '17 at 07:24

answered Feb 20 '17 at 12:36

OPhamster

11
1
5

may be I am making a stupid mistake... but this only gives me first line of output. rest is all NaN. – Nil Feb 20 '17 at 17:20

score 0 · Accepted Answer · edited May 23 '17 at 12:16

Finally found a solution. At this point, the most optimal solution that I could find. Leveraging Signal processing from scipy.signal.lfilter. Here goes the solution:

for media_var in self.config.media_vars: adstocked_vals = lfilter([1], [1, -ret_rate[media_var]], data[media_var])

Thanks to all who gave different answers. It did help me in some way or other. Thanks to a solution by Warren Weckesser on a different thread here

elcombato · Answer 3 · 2017-02-21T05:45:33.570

-1

Just use the slice of data which is of interest
The Series ret_rate is for demonstrating purposes a range with the length equal to the number of columns of data
You multiply data with ret_rate along the columns (axis=1) and shift this DataFrame by one and add the initial DataFrame

in code:

data = pd.read_csv("data_so.csv").iloc[:,4:]
ret_rate = pd.Series(range(df.shape[1]), index=df.columns)
data.multiply(ret_rate, 1).shift() + data

So all the calculation ist just one line of pandas.

For multiply() it is important that the index of ret_rate is equal to the column names of data.

edited Feb 21 '17 at 05:45

answered Feb 20 '17 at 12:35

elcombato

473
1
4
16

ret_rate has values in csv. facing difficulty in converting that into series with the index being the data.columns. any help with that please... `ret_rate_s = pd.Series(ret_rate.ix[:,0], index=data.columns)` gives me all NaN – Nil Feb 20 '17 at 17:37
To read the csv as a Series you should use the pandas function [`read_csv`](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html). Either the csv has only one column or you define the column to use with the parameter `usecols`. In either case if you set `squeeze=True` the `read_csv` function returns a Series instead of a DataFrame. And after reading the csv you can set `ret_rate.index = data.columns` – elcombato Feb 21 '17 at 05:44
It worked! @elcombato. I could create a series but then, only to realize that df.multiply cannot handle float type. Only Intergers allowed! bummer! – Nil Feb 22 '17 at 10:48
@Nil `df.multiply()` shouldn't have any problem with `float`! – elcombato Feb 22 '17 at 10:56
Correct! I found out that I had some text fields. my bad! – Nil Feb 22 '17 at 17:07
Hi @Nil if this or any answer has solved your question please consider [accepting it](http://meta.stackexchange.com/q/5234/179419) by clicking the check-mark. This indicates to the wider community that you've found a solution and gives some reputation to both the answerer and yourself. There is no obligation to do this. – elcombato Feb 24 '17 at 15:42
Finally figured out an answer and posted below. The solution you gave was almost working except for the fact that it was taking the older value of the previous row. I needed to take the latest value of the previous row. Thanks for your help anyways. Got to learn about the `shift()` function. – Nil Feb 25 '17 at 14:09

row(n-1)*rate + row(n) in a pandas dataframe

3 Answers3