-3

#python code to replace the value of each cell by mean and variance

for i in range(len(df)):
    for j in range(2,5):
        df.iloc[i,j]=((df.iloc[i,j]-np.mean(df.iloc[i,2:]))/np.var(df.iloc[i,2:]))

My data is arranged in matrix format where j represent the observations and i represent the year as follows.

df = pd.DataFrame({'Index': ['01/01/2019', '01/02/2019', '01/03/2019'], 
                   'descriptor':['BV','BV','BV'],
                   'abc': [0.8, 0.7, 0.6],
                  'bcd':[0.5,0.3,0.9],
                  'efg':[0.6,0.5,0.3]})    

Output enter image description here

The correct output should be enter image description here

Rahul
  • 11
  • 5
  • Please read this https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples, and include the relevant data in your question (as text) - both input and expected output. – Roy2012 Jul 22 '20 at 06:18
  • Please post the question properly. I don't think people can understand what is the data and what expected output – ShikharDua Jul 22 '20 at 06:24
  • #test_data df = pd.DataFrame({'Index': ['01/01/2019', '01/02/2019', '01/03/2019'], 'abc': [0.8, 0.7, 0.6], 'bcd':[0.5,0.3,0.9], 'efg':[0.6,0.5,0.3]}) – Rahul Jul 22 '20 at 06:57
  • @Rahul - please add the relevant data to the **question itself**. Also - please add the expected output. – Roy2012 Jul 22 '20 at 06:58
  • Could you please explain how do you get from the input to the output? what calculation leads to 7.5, for example? – Roy2012 Jul 22 '20 at 09:09
  • @Roy2012 , take the mean and variance of first row ( 0.8,,0.5,,0.6) and then substract from each value the mean and divide the result by variance so it should be like (0.8 - average(0.8,0.5,0.6))/variance (0.8,,0.5,,0.6) – Rahul Jul 22 '20 at 09:14

1 Answers1

0

I believe this is what you're looking for:

df.iloc[:,2:].subtract(df.mean(axis=1), axis=0).div(df.var(axis=1), axis=0)

The same calculation, in several steps for clarity:

relevant_columns = df.iloc[:,2:]

# Calculate mean and var per row (hence axis=1)
mean_per_row = df.mean(axis=1)
var_per_row = df.var(axis=1)

# subtract and then divide along the columns (hence axis=0)
val_minus_mean = relevant_columns.subtract(mean_per_row, axis=0)
res = val_minus_mean.div(var_per_row, axis=0)
Roy2012
  • 11,755
  • 2
  • 22
  • 35
  • yes this gives the correct result, can u explain the logic behind axis=0 for df.iloc[:,2:].subtract(df.mean(axis=1), axis=0) and then again axis =0 for df.iloc[:,2:].subtract(df.mean(axis=1), axis=0).div(df.var(axis=1), axis=0). Many thanks.. – Rahul Jul 22 '20 at 10:29
  • The additional notes in the answer. – Roy2012 Jul 22 '20 at 11:08