R does not seem to take into account a separate variable i created and instead applies the whole list

Question

so this is the code i used when i wanted to find "what proportion of these numbers are within one standard deviation away from the list's average?" :

library(dplyr)
bodyweights <- dat$Bodyweight
mean_bodyweights <- mean(bodyweights)
sd_bodyweight <- sd(bodyweights)
lower_bound <- mean_bodyweights - sd_bodyweight
upper_bound <- mean_bodyweights + sd_bodyweight
proportion_with1sd <- pnorm(upper_bound, mean_bodyweights, sd_bodyweight) - pnorm(lower_bound, mean_bodyweights, sd_bodyweight)
print(proportion_with1sd)

#> [1] 0.6826895

and the question that followed up asked "Define y to be the weights of males on the control diet.

What proportion of the mice are within one standard deviation away from the average weight?" and this was what i wrote:

library(dplyr)

male_chow_data <- dat %>%
  filter(Sex == "M" & Diet == "chow")
y <- male_chow_data$Bodyweight
mean_weight_chow <- mean(y)
sd_weight_chow <- popsd(y)
lower_bound <- mean_weight_chow - sd_weight_chow
upper_bound <- mean_weight_chow + sd_weight_chow
proportion_within1sd <- pnorm(upper_bound, mean_weight_chow, sd_weight_chow) - pnorm(lower_bound, mean_weight_chow, sd_weight_chow)
print(proportion_within1sd)

#> [1] 0.6826895

I do not understand why i get the same answer? why is it not using the data which i specified as "male_chow_data" and is using the entire list?

I used the rm() function and did it all over again, yet i get the same response

Welcome to SO! It would be easier to help you if you provide [a minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) including a snippet of your data. As is one can only guess what might be the issue as we don't have your data and can't run your code. — stefan, Jul 30 '23 at 08:48

Robert Hacken · Answer 1 · 2023-07-31T09:32:17.163

Your code computes the probability of a value lying within one SD from the mean for standard normal distribution. This will always be 0.6826895, no matter what data you base your computation on.

x <- runif(50)
pnorm(mean(x) + sd(x), mean(x), sd(x)) - pnorm(mean(x) - sd(x), mean(x), sd(x))
# [1] 0.6826895

Asymptotically (I use 1000000 as a large number below), you would get the same value with

set.seed(1)
mean(abs(rnorm(1000000)) < 1)
# [1] 0.682331

To find the proportion of values (x below) within one standard deviation from their average you can do this:

set.seed(1)
x <- runif(50)
mean(abs(x - mean(x)) < sd(x))
# [1] 0.64

It computes the distance of each value from the mean which is then compared to standard deviation. This produces a logical vector which is supplied to the outer mean() which takes FALSE as 0 and TRUE as 1 and thus computes the proportion of TRUEs, i.e. values within 1 SD.

what would you suggest as the best way to get past this? (sorry i'm still a beginner on R) — johnathan_B, Jul 31 '23 at 07:14
@johnathan_B Oh, I see, I was so concentrated on the question's title and the code that I overlooked that you actually mentioned your main goal. I updated the answer to hopefully answer this. — Robert Hacken, Jul 31 '23 at 09:34

R does not seem to take into account a separate variable i created and instead applies the whole list

1 Answers1