Discussion:
Hi. It is very interesting problem. It needs quite an effort to be solved properly and not always solution can be found.
First thing is that when you truncate a distribution (set a min and max for it) standard deviation is limited (has a maximum depending on min and max values). If you want too big value of it - you can not get it.
Second restriction limits mean. It is obvious that if you want mean below minimum and above maximum it will not work, but you may want something too close to limits and still it can not be satisfied.
Third restriction limits a combination of this parameters. Im not sure how does it work, but i am pretty sure not all the combinations may be satisfied.
But there are some combinations that may work and may be found.
Solution:
The problem is: what are the parameters: mean
and sd
of truncated (cut) distribution with defined limits a
and b
, so in the end the mean will be equal to desired_mean
and standard deviation will be equal to desired_sd
.
It is important that values of parameters: mean
and sd
are used before truncation. So that is why in the end mean and deviation are diffrent.
Below is the code that solves the problem using function optim()
. It may not be the best solution for this problem, but it generally works:
require(truncnorm)
eval_function <- function(mean_sd){
mean <- mean_sd[1]
sd <- mean_sd[2]
sample <- rtruncnorm(n = n, a = a, b = b, mean = mean, sd = sd)
mean_diff <-abs((desired_mean - mean(sample))/desired_mean)
sd_diff <- abs((desired_sd - sd(sample))/desired_sd)
mean_diff + sd_diff
}
n = 1000
a <- 1
b <- 6
desired_mean <- 3
desired_sd <- 1
set.seed(1)
o <- optim(c(desired_mean, desired_sd), eval_function)
new_n <- 10000
your_sample <- rtruncnorm(n = new_n, a = a, b = b, mean = o$par[1], sd = o$par[2])
mean(your_sample)
sd(your_sample)
min(your_sample)
max(your_sample)
eval_function(c(o$par[1], o$par[2]))
I am very interested if there are other solutions to that problem, so please post them if you find other answers.
EDIT:
@Mikko Marttila: Thanks to your comment and link: Wikipedia I implemented formulas to calculate mean and sd of truncated distribution. Now the solution is WAY more elegant and it should calculate quite accurately mean and sd of the desired distribution if they exist. It works much faster also.
I implemented eval_function2
which should be used in the optim()
function instead of previous one:
eval_function2 <- function(mean_sd){
mean <- mean_sd[1]
sd <- mean_sd[2]
alpha <- (a - mean)/sd
betta <- (b - mean)/sd
trunc_mean <- mean + sd * (dnorm(alpha, 0, 1) - dnorm(betta, 0, 1)) /
(pnorm(betta, 0, 1) - pnorm(alpha, 0, 1))
trunc_var <- (sd ^ 2) *
(1 +
(alpha * dnorm(alpha, 0, 1) - betta * dnorm(betta, 0, 1))/
(pnorm(betta, 0, 1) - pnorm(alpha, 0, 1)) -
(dnorm(alpha, 0, 1) - dnorm(betta, 0, 1))/
(pnorm(betta, 0, 1) - pnorm(alpha, 0, 1)))
trunc_sd <- trunc_var ^ 0.5
mean_diff <-abs((desired_mean - trunc_mean)/desired_mean)
sd_diff <- abs((desired_sd - trunc_sd)/desired_sd)
}