How to use the apply family of functions to adjust values of a list by grouped index

Question

Ok so I have an excel sheet with a variety of scenarios and values, and what I'd like to do is transform some of those values by using a random distribution. I'm able to do that one scenario at at time, but I'd like to be able to do it in a more compact way, possibly with the apply function family. Here is a small version of my matrix which I use as a data.table with setDT:

matrixfromexcel = 

Scenario char num1 num2 num3 val1 val2 val3

1        1    0    4    8    1.22 2.31 7.33

1        1    0    4    8    1.22 2.31 7.33

1        1    0    4    8    1.22 2.31 7.33

1        1    0    4    8    1.22 2.31 7.33

1        1    0    4    8    1.22 2.31 7.33

1        1    0    4    8    1.22 2.31 7.33

1        1    0    4    8    1.22 2.31 7.33

1        1    0    4    8    1.22 2.31 7.33

2        5    2    0    1    4.2  5.011 12.542

2        5    2    0    1    4.2  5.011 12.542 

2        5    2    0    1    4.2  5.011 12.542 

2        5    2    0    1    4.2  5.011 12.542 

2        5    2    0    1    4.2  5.011 12.542 

2        5    2    0    1    4.2  5.011 12.542 

2        5    2    0    1    4.2  5.011 12.542 

2        5    2    0    1    4.2  5.011 12.542

...

1200    66    8    1    0    555  120 1700

So as you can see, the scenario number separates the values into groups, and there is a large number of scenarios, up to 1000's+. Here is what I've used in order to add random numbers from a normally distributed function to the values of one column of one scenario:

matrixfromexcel[Scenario == 1, val1 := val1+rnorm(8, 1.22, 1)]

Where 8 is the number of different random numbers, 1.22 is the value I want the mean centered at, and 1 is the # of standard deviations I want in the random numbers.

So if I wanted to loop around from Scenario 1 to 1000, should I try an apply function or just try to use a loop? If apply function, could you show me your suggestion? Thank you

fyi -- in `rnorm(8, 1.22, 1)` 1 is the _value_ of one standard deviation, rather than the number of standard deviations. `sd(rnorm(x,y,z))` will by definition have a value close to `z` — arvi1000, Jan 16 '19 at 17:52
I don't understand the intention. If you're just adding a random value to each, why not do it all at once i.e. `matrixfromexcel[Scenario %in% 1:1000, val1 := val1 + rnorm(length(val1), 1.22, 1)]`? What are you hoping to achieve by adding a random value for each `Scenario` through a series of separate steps? — IceCreamToucan, Jan 16 '19 at 18:25
Hello @IceCreamToucan (love the name), because I'm not sure if I will have the same parameters for each number. The SD or # of values for the random numbers generated may change for each scenario, and so I want to have the granularity to change that if needed. Feel free to suggest a solution without it too, for reference if you'd like. — SqueakyBeak, Jan 16 '19 at 18:27

score 1 · Accepted Answer · answered Jan 16 '19 at 17:56

You can leverage the by argument in data.table and the special operator .N which refers to the # of rows within each group. Here's something to get you started:

library(data.table)
#> Warning: package 'data.table' was built under R version 3.4.4
dt <- data.table(Scenario = rep(c(1,2,3), times = c(8,5,3)), val1 = rep(c(1.22, 4.2, 6), times = c(8,5,3)))
dt[, new_val := val1 + rnorm(.N, val1, 1), keyby = Scenario]

^{Created on 2019-01-16 by the reprex package (v0.2.1)}

For transparency, I created new_val versus overwriting val1 but you can modify that as you see fit. Also note, you currently passed in 1 to the sd parameter for rnorm(). If that's what you intended, great. If not, modify accordingly.

Ok great @Chase, thank you! If I wanted to specify Scenario with a start and end value, then could I do something like 1:1:1200? — SqueakyBeak, Jan 16 '19 at 18:25
and if someone wants to use a character value instead of numeric please see this thread: https://stackoverflow.com/questions/24536771/conditionally-replacing-column-values-with-data-table — SqueakyBeak, Jan 23 '19 at 19:46

How to use the apply family of functions to adjust values of a list by grouped index

1 Answers1