For a coding challenge on a learning platform, i was asked to compute sampling errors for 100 different sample sizes. My approach does not generate the same values as the provided solution does, but I do not understand why - to me, they seem to be doing the same thing. Or am I missing anything? I am a coding beginner, so it is entirely possible that I am missing something!
Here is the setup for the challenge:
set.seed(4)
parameter <- mean(houses$SalePrice) # parameter value = 180796.1
sample_sizes <- seq(from = 5, by=29, length.out=100)
library(purrr)
Here is my approach:
sample_means <- map_dbl(sample_sizes, function(x) mean(sample(houses$SalePrice, size=x)))
sampling_errors_a <- parameter - sample_means
Here is the provided solution:
sampling_errors <- map_dbl(sample_sizes, function(x) parameter - mean(sample(houses$SalePrice, size=x)))
When I run identical(sampling_errors_a, sampling_errors)
, R keeps returning FALSE. I looked at the values of both vectors and, in fact, they are totally different.
I would love to understand why the 2 approaches do not arrive at the same solution. If somebody had a moment to spare to explain, I would very much appreciate it. Thank you to all of you in advance!