0

Is there a numpy broadcasting solution for creating a matrix that outputs the standard deviation between all columns in a DataFrame?

The following solution was very useful, but works only for the mean difference (for example independence, etc...) Pandas - Creating Difference Matrix from Data Frame.
Thanks @divakar, @ayhan, @jezrael, and others in that discussion

The input would be a DfA

0  A1     B1     C1
1  8.01   9.29   7.31
2  8.23   9.05   7.46
3  8.16   9.68   7.34
4  8.27   8.95   7.05 

The 2 desired outputs a DfM and DfStd with mean and stdev of the differences between each columns.

0   St1   St2   St3
1 a1-a1  b1-a1  c1-a1
2 a1-b1  b1-b1  c1-b1
3 a1-c1  b1-c1  c1-c1

I was able to derive the matrix of the means using the np.subtract.outer function described in the previous post by running:

[in]:arrmean = np.subtract.outer(*[dfA.mean()]*2).T
[out]: a 3x3 arrear with 9 elements 

This works because means of diff yields same results as diff of means. For the Std, the relationship does not hold. The use of np.subtract.outer(*[dfA.std()]*2).T yields incorrect matrix results. I am trying to replace [dfA.std()] with [np.std(dfA['A1'] - dfA['B1']), but that yields an arrear with 1x1, value zero (obvious error)

Any ideas?

I have derived the solution manually but would be very grateful for the script.

Luan Naufal
  • 1,346
  • 9
  • 15
itutle
  • 3
  • 3

1 Answers1

0

Update: I was not able to find a direct formula to build a matrix for the standard deviation of the difference between ALL combinations of columns in a dataframe. The only way was to: 1) iterate between all combinations of columns (n) in dfA [(n * (n-1))/2] and create a new df (dfB). 2) get descriptive stats on dfB and build the matrix from that df(B) describe.

itutle
  • 3
  • 3