Python loop to replace element by the difference with mean / variance of the series

Question

#python code to replace the value of each cell by mean and variance

for i in range(len(df)):
    for j in range(2,5):
        df.iloc[i,j]=((df.iloc[i,j]-np.mean(df.iloc[i,2:]))/np.var(df.iloc[i,2:]))

My data is arranged in matrix format where j represent the observations and i represent the year as follows.

df = pd.DataFrame({'Index': ['01/01/2019', '01/02/2019', '01/03/2019'], 
                   'descriptor':['BV','BV','BV'],
                   'abc': [0.8, 0.7, 0.6],
                  'bcd':[0.5,0.3,0.9],
                  'efg':[0.6,0.5,0.3]})

Output enter image description here

The correct output should be enter image description here

Please read this https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples, and include the relevant data in your question (as text) - both input and expected output. — Roy2012, Jul 22 '20 at 06:18
Please post the question properly. I don't think people can understand what is the data and what expected output — ShikharDua, Jul 22 '20 at 06:24
#test_data df = pd.DataFrame({'Index': ['01/01/2019', '01/02/2019', '01/03/2019'], 'abc': [0.8, 0.7, 0.6], 'bcd':[0.5,0.3,0.9], 'efg':[0.6,0.5,0.3]}) — Rahul, Jul 22 '20 at 06:57
@Rahul - please add the relevant data to the **question itself**. Also - please add the expected output. — Roy2012, Jul 22 '20 at 06:58
Could you please explain how do you get from the input to the output? what calculation leads to 7.5, for example? — Roy2012, Jul 22 '20 at 09:09
@Roy2012 , take the mean and variance of first row ( 0.8,,0.5,,0.6) and then substract from each value the mean and divide the result by variance so it should be like (0.8 - average(0.8,0.5,0.6))/variance (0.8,,0.5,,0.6) — Rahul, Jul 22 '20 at 09:14

Roy2012 · Accepted Answer · 2020-07-22T11:08:44.017

0

I believe this is what you're looking for:

df.iloc[:,2:].subtract(df.mean(axis=1), axis=0).div(df.var(axis=1), axis=0)

The same calculation, in several steps for clarity:

relevant_columns = df.iloc[:,2:]

# Calculate mean and var per row (hence axis=1)
mean_per_row = df.mean(axis=1)
var_per_row = df.var(axis=1)

# subtract and then divide along the columns (hence axis=0)
val_minus_mean = relevant_columns.subtract(mean_per_row, axis=0)
res = val_minus_mean.div(var_per_row, axis=0)

edited Jul 22 '20 at 11:08

answered Jul 22 '20 at 09:27

Roy2012

11,755
2
22
35

yes this gives the correct result, can u explain the logic behind axis=0 for df.iloc[:,2:].subtract(df.mean(axis=1), axis=0) and then again axis =0 for df.iloc[:,2:].subtract(df.mean(axis=1), axis=0).div(df.var(axis=1), axis=0). Many thanks.. – Rahul Jul 22 '20 at 10:29
The additional notes in the answer. – Roy2012 Jul 22 '20 at 11:08

Python loop to replace element by the difference with mean / variance of the series

1 Answers1