23

I am simulating data using rnorm, but I need to set an upper and lower limit, does anyone know how to do this?

code:

rnorm(n = 10, mean = 39.74, sd = 25.09)

Upper limit needs to be 340, and the lower limit 0

I am asking this question because I am rewriting an SAS-code into an R-code. I have never used SAS. I am trying to rewrite the following piece of code:

sim_sample(simtot=100000,seed=10004,lbound=0,ubound=340,round_y=0.01,round_m=0.01,round_sd=0.01,n=15,m=39.74,sd=25.11,mk=4)
Jolanda Kossakowski
  • 451
  • 2
  • 6
  • 14
  • 1
    It's not clear what you want to do. The normal distribution is unlimited by definition. Do you want a different distribution (e.g., `runif` that lets you define limits) or discard values outside of you limits? Please clarify. – Roland Oct 13 '13 at 08:27
  • 4
    I want to sample data from a normal distribution, but with limits on the range. In this case, I have a mean of 39.74 and a SD of 25.09, and I need to sample data with this mean and standard deviation, but the numbers cannoot exceed 340. Do I need to use runif for this? – Jolanda Kossakowski Oct 13 '13 at 08:42
  • You don't seem to know what you need. That makes it hard to answer your question. – Roland Oct 13 '13 at 08:56
  • 2
    I do know what I need, I just don't know how to get it. I have a SAS code that I am trying to rewrite in R, and in that SAS code the writer has an upper and a lower limit while sampling with a certain mean and standard deviation. Is sample() an option? I have put the SAS code that I am trying to rewrite in the question – Jolanda Kossakowski Oct 13 '13 at 10:14

6 Answers6

32

The rtruncnorm() function will return the results you need.

  library(truncnorm)
  rtruncnorm(n=10, a=0, b=340, mean=39.4, sd=25.09)
charlie
  • 602
  • 4
  • 12
15

You can make your own truncated normal sampler that doesn't require you to throw out observations quite simply

rtnorm <- function(n, mean, sd, a = -Inf, b = Inf){
    qnorm(runif(n, pnorm(a, mean, sd), pnorm(b, mean, sd)), mean, sd)
}
Dason
  • 60,663
  • 9
  • 131
  • 148
10

Like this?

mysamp <- function(n, m, s, lwr, upr, nnorm) {
  samp <- rnorm(nnorm, m, s)
  samp <- samp[samp >= lwr & samp <= upr]
  if (length(samp) >= n) {
    return(sample(samp, n))
  }  
  stop(simpleError("Not enough values to sample from. Try increasing nnorm."))
}

set.seed(42)
mysamp(n=10, m=39.74, s=25.09, lwr=0, upr=340, nnorm=1000)
#[1] 58.90437 38.72318 19.64453 20.24153 39.41130 12.80199 59.88558 30.88578 19.66092 32.46025

However, the result is not normal distributed and usually won't have the mean and sd you've specified (in particular if the limits are not symmetric around the specified mean).

Edit:

According to your comment it seems you want to translate this SAS function. I am not an SAS user, but this should do more or less the same:

mysamp <- function(n, m, s, lwr, upr, rounding) {
  samp <- round(rnorm(n, m, s), rounding)
  samp[samp < lwr] <- lwr
  samp[samp > upr] <- upr
  samp
}

set.seed(8)
mysamp(n=10, m=39.74, s=25.09, lwr=0, upr=340, rounding=3)
#[1] 37.618 60.826 28.111 25.920 58.207 37.033 35.467 12.434  0.000 24.857

You may then want to use replicate to run the simulations. Or if you want faster code:

sim <- matrix(mysamp(n=10*10, m=39.74, s=25.09, lwr=0, upr=340, rounding=3), 10)
means <- colMeans(sim)
sds <- apply(sim, 2, sd)
Community
  • 1
  • 1
Roland
  • 127,288
  • 10
  • 191
  • 288
0

Assuming you want exactly 10 numbers and not the subset of them that is >0, <340 (and night not be a normal distribution):

    aa <- rnorm(n = 10, mean = 39.74, s = 25.09)

    while(any(aa<0 | aa>340)) { aa <- rnorm(n = 10, mean = 39.74, s = 25.09) }
alexis_laz
  • 12,884
  • 4
  • 27
  • 37
  • Depending on n and the limits set this function could take a *very* long time to generate the sample of interest – Dason Oct 13 '13 at 21:52
0

This is the function that I wrote to achieve the same purpose. It normalizes the result from the rnorm function and then adjusts it to fit the range.

NOTE: The standard deviation and mean (if specified) get altered during the normalization process.

#' Creates a random normal distribution within the specified bounds.
#' 
#' WARNING: This function does not preserve the standard deviation or mean.
#' @param n The number of values to be generated
#' @param mean The mean of the distribution
#' @param sd The standard deviation of the distribution
#' @param lower The lower limit of the distribution
#' @param upper The upper limit of the distribution
rtnorm <- function(n, mean=NA, sd=1, lower=-1, upper=1){
  mean = ifelse(is.na(mean)|| mean < lower || mean > upper,
                mean(c(lower, upper)), mean)
  data <- rnorm(n, mean=m, sd=sd) # data

  if (!is.na(lower) && !is.na(upper)){ # adjust data to specified range
    drange <- range(data)           # data range
    irange <- range(lower, upper)   # input range
    data <- (data - drange[1])/(drange[2] - drange[1]) # normalize data (make it 0 to 1)
    data <- (data * (irange[2] - irange[1]))+irange[1] # adjust to specified range
  }
  return(data)
}
Alex Essilfie
  • 12,339
  • 9
  • 70
  • 108
  • Note that this isn't the same as what is typically referred to as a truncated normal. – Dason Apr 08 '19 at 18:19
  • @Dason: No, it is not a truncated normal distribution. It rather expands/reduces the bounds of a standard normal distribution to meet the specified upper and lower limits. – Alex Essilfie Apr 09 '19 at 15:21
0

There are several ways to set upper and lower limits to a normal distribution, what will cause that the result is no longer normal distributed.

Assuming a mean=0, sd=1 producing N=1e5 values with a lower boundary of LO=-1 and an upper boundary of UP=2.

N <- 1e5L
LO <- -1
UP <- 2

Move outliers to border (@Roland)

set.seed(42)
x <- pmax(LO, pmin(UP, rnorm(N)))
mean(x)
#[1] 0.07238029
median(x)
#[1] -0.002066374
sd(x)
#[1] 0.8457605
hist(x, 30)

hist of Set outliers to borders

Cut outliers of (@Dason, @Roland, truncnorm::rtruncnorm, MCMCglmm::rtnorm)

set.seed(42)
x <- qnorm(runif(N, pnorm(LO), pnorm(UP)))
mean(x)
#[1] 0.2317875
median(x)
#[1] 0.173679
sd(x)
#[1] 0.7236536

hist of Cut outliers of

Scale (@Alex Essilfie)

set.seed(42)
x <- rnorm(N)
x <- (x-min(x))/(max(x)-min(x))*(UP-LO)+LO
mean(x)
#[1] 0.4474876
median(x)
#[1] 0.4482257
sd(x)
#[1] 0.3595199

hist of scale

Combination of methods. E.g. Cut and scale:

set.seed(42)
x <- qnorm(runif(N, pnorm(-3), pnorm(3)))
x <- (x-min(x))/(max(x)-min(x))*(UP-LO)+LO
mean(x)
#[1] 0.5010759
median(x)
#[1] 0.5014713
sd(x)
#[1] 0.4957751

hist of Combination

Asymmetric combination

set.seed(42)
n <- round(N*abs(LO)/diff(range(c(LO, UP))))
x <- c(qnorm(runif(n, pnorm(-3), 0.5)), qnorm(runif(N-n, 0.5, pnorm(3))))
x <- ifelse(x < 0, x/min(x)*LO, x/max(x)*UP)
mean(x)
#[1] 0.2651627
median(x)
#[1] 0.2127903
sd(x)
#[1] 0.5078264

hist of asymmetric combination

GKi
  • 37,245
  • 2
  • 26
  • 48