0

I am simulating data for a research project that is taking a long time. I would like to run some experiments with my data, however, I do not have enough data simulated for this to be practical. I would like to supplement the data that I do have simulated with random data that is normally distributed.

Thus far, I have a data frame that looks like this:

Training_Data <- data.frame( A = runif(5), B = runif(5), C = runif(5), D = runif(5) )

I then took summary statistics of this data frame as shown:

Training_Data_Sum <- as.data.frame(apply(Training_Data[1:4], 2, summary))

for which I have the min, max, mean, STD, median, etc. for each column of data.

Now, what I would like to do, is to write a function that will use the 5 rows of data in the Training_Data data frame, and expand it to 50 rows of normally distributed data using the min, max, mean, and STD values obtained from the summary statistics of the Training_Data frame.

I am assuming that I would need to use rtruncnorm function as follows:

Training_Data_50A <- rtruncnorm(n=50, A_min, A_max, A_mean, A_std)
Training_Data_50B <- rtruncnorm(n=50, B_min, B_max, B_mean, B_std)
Training_Data_50C <- rtruncnorm(n=50, C_min, C_max=, C_mean, C_std)
Training_Data_50D <- rtruncnorm(n=50, D_min, D_max, D_mean, D_std)

where the min, max, mean, and std values are obtained from the appropriate column.

Could someone point me in the correct direction on how to convert this task into a proper R function?

Eazie
  • 53
  • 7
  • I think you are looking for the Box–Muller transform. If you want a truncated distribution based on the min/max this would be a Monte Carlo experiment. You simply reject any value drawn that is outside of the range of the data. So you keep drawing until you accept 50 values. – Baraliuh May 31 '21 at 16:04
  • 1
    Normally distributed data only has two parameters: mean and SD. You are are over specifying. – IRTFM May 31 '21 at 16:05
  • check this question here: https://stackoverflow.com/questions/19343133/setting-upper-and-lower-limits-in-rnorm – Pedro Alencar May 31 '21 at 16:26

1 Answers1

0

I am not a mind reader, but I guess this is what you are looking for:

rtruncnorm <- function(n, min, max, mean, std){
  accepted_moves <- c()
  i <- 1
  while(length(accepted_moves)<n){
    draw <- rnorm(1, mean, std)
    if(between(draw, min, max)){
      accepted_moves[i] <- draw
      i <- i+1
    }
  }
return(accepted_moves)
}

Simulation:

input_data <- runif(5)
sum_data <- input_data %>% 
  summary()

rtruncnorm(50, sum_data[1], sum_data[6], sum_data[5], sd(input_data))


[1] 0.5259511 0.5575217 0.6253954 0.8497881 0.8902441 0.8462771 0.4441249 0.6323441 0.9069752 0.4665141 0.4922236 0.9103832
[13] 0.6352267 0.5996836 0.5647709 0.6622921 0.4687262 0.4164213 0.1878030 0.5707349 0.8617818 0.4060878 0.7911329 0.5712865
[25] 0.8958417 0.3603563 0.5451828 0.8638422 0.7079184 0.5580455 0.9099664 0.1308865 0.8396717 0.7088652 0.7627120 0.5839610
[37] 0.7446260 0.6821685 0.4831258 0.6643238 0.4619952 0.3614351 0.5678148 0.5655968 0.5316892 0.4885681 0.6507399 0.5020127
[49] 0.5227599 0.5890428
Baraliuh
  • 593
  • 3
  • 12