Is there a numpy broadcasting solution for creating a matrix that outputs the standard deviation between all columns in a DataFrame?
The following solution was very useful, but works only for the mean difference (for example independence, etc...) Pandas - Creating Difference Matrix from Data Frame.
Thanks @divakar, @ayhan, @jezrael, and others in that discussion
The input would be a DfA
0 A1 B1 C1
1 8.01 9.29 7.31
2 8.23 9.05 7.46
3 8.16 9.68 7.34
4 8.27 8.95 7.05
The 2 desired outputs a DfM and DfStd with mean and stdev of the differences between each columns.
0 St1 St2 St3
1 a1-a1 b1-a1 c1-a1
2 a1-b1 b1-b1 c1-b1
3 a1-c1 b1-c1 c1-c1
I was able to derive the matrix of the means using the np.subtract.outer function described in the previous post by running:
[in]:arrmean = np.subtract.outer(*[dfA.mean()]*2).T
[out]: a 3x3 arrear with 9 elements
This works because means of diff yields same results as diff of means. For the Std
, the relationship does not hold. The use of np.subtract.outer(*[dfA.std()]*2).T
yields incorrect matrix results. I am trying to replace [dfA.std()]
with [np.std(dfA['A1'] - dfA['B1'])
, but that yields an arrear with 1x1, value zero (obvious error)
Any ideas?
I have derived the solution manually but would be very grateful for the script.