0

I have some data that looks like this

Condition 1 Condition 2 Condition 3 Condition 4 Condition 5
0 0 0 70 0
0 50 10 0 0
120 0 0 5 5

Where the value in each cell is the number of meters of an asset that is the given condition. Or in other words, a count of the number of meters that are a '4'. How do I calculate a standard deviation for this? Obviously the std.dev would be '0' for the first row, higher for row 2, and fairly low for row 3. Something similar to REPT, but that repeats a value x times in a formula? I considered a helper column but the number of meters that there are in total makes this impractical.

  • Can you make it more tangible, please? It is unclear to me what you need. ⬤ Does the 1st row mean that there are 70 times the same value ('4' in your example) and no other value in that series? – Fernando Barbosa Jul 27 '21 at 01:19
  • Have you got the data that the counts came from originally? – JMP Jul 27 '21 at 05:00
  • To clarify, a field worker assesses an asset and records the condition by meters. For example in row 1, we have a 70m asset, all of which is condition 4. In row 2, we have a 60m asset, of which 50m is condition 2, and 10m is condition 3. In row 3 - 130m asset, 120m is '1', 5m is '4', 5m is '5'. I have posted an answer which I think has figured it out. – Peter Tomlinson Jul 27 '21 at 21:46

3 Answers3

2

I am not a math expert, but I can show you how to "make a range of numbers" based on the criteria shown, using Excel 365.

Suppose your data is in the range B2:F4 as shown below. In cell G2, enter the following formula and drag it down:

=STDEV.P(--FILTERXML("<t><s>"&TEXTJOIN("</s><s>",1,REPT($B$1:$F$1&"</s><s>",$B2:$F2))&"</s></t>","//s[number()=.]"))

The above will calculate the standard deviation using the STDEV.P function, but I am unsure if this is the right function to use as there are many other variations to the original STDEV function.

Regardless, the following part of the formula is able to return a range of numbers as desired:

=--FILTERXML("<t><s>"&TEXTJOIN("</s><s>",1,REPT($B$1:$F$1&"</s><s>",$B2:$F2))&"</s></t>","//s[number()=.]")

You can view this question and the answer by JvdV to understand the use of the FILTERXML function.

sample

Terry W
  • 3,199
  • 2
  • 8
  • 24
1

Another way of doing it is to use the alternative SD formula

enter image description here

which would give you

=SQRT((SUM(A2:E2*COLUMN(A2:E2)^2)-SUM(A2:E2*COLUMN(A2:E2))^2/SUM(A2:E2))/SUM(A2:E2))

for the population standard deviation.

The Excel 365 version using Let is more readable I think:

=LET(x,COLUMN(A2:E2),
mpy,A2:E2,
n,SUM(mpy),
sumxsq,SUM(mpy*x^2),
sumsqx,SUM(mpy*x)^2,
numerator,sumxsq-sumsqx/n,
SQRT(numerator/n)
)

A bit less obviously, you could get it from the original formula

enter image description here

=SQRT(SUM(A2:E2*(COLUMN(A2:E2)-SUM(A2:E2*COLUMN(A2:E2))/SUM(A2:E2))^2/SUM(A2:E2)))

Again, in Excel 365 you could write this as:

=LET(x,COLUMN(A2:E2),
mpy,A2:E2,
n,SUM(mpy),
xbar,SUM(mpy*x/n),
numerator,SUM(mpy*(x-xbar)^2),
SQRT(numerator/n)
)

enter image description here

Change the denominator to

(SUM(A2:E2)-1)

for the sample standard deviation.

Tom Sharpe
  • 30,727
  • 5
  • 24
  • 37
0

I ended up figuring it out. I added a column which calculated the average. (Say column F) I then had a formula like this

=SQRT(SUM(A2*POWER((1-F2),2),B2*POWER((2-F2),2),C2*POWER((3-F2),2),D2*POWER((4-F2),2),E2*POWER((5-F2),2))/SUM(A2:E2))

Essentially this calculated the variance from the mean for each condition value, multiplied by the number of values (e.g. number of meters) of asset that are that particular condition, then did the normal other standard deviation calculations (sum, divide by total, square).