Assume I have a list of length D containing data.table objects. Each data.table has the same columns (X, Y) and same number of rows N. I'd like to construct another table with N rows, with the individual rows taken from the tables specified by an index vector also of length N. Restated, each row in the final table is taken from one and only one of the tables in the array, with the index of the source table specified by an existing vector.
N = 100 # rows in each table (actual ~1000000 rows)
D = 4 # number of tables in array (actual ~100 tables)
tableArray = vector("list", D)
for (d in 1:D) {
tableArray[[d]] = data.table(X=rnorm(N), Y=d) # actual ~100 columns
}
tableIndexVector = sample.int(D, N, replace=TRUE) # length N of random 1:D
finalTable = copy(tableArray[[1]]) # just for length and column names
for (n in 1:N) {
finalTable[n] = tableArray[[tableIndexVector[n]]][n]
}
This seems to work the way I want, but the array within array notation is hard to understand, and I presume the performance of the for loop isn't going to be very good. It seems like there should be some elegant way of doing this, but I haven't stumbled across it yet. Is there another way of doing this that is efficient and less arcane?
(In case you are wondering, each table in the array represents simulated counterfactual observations for a subject under a particular regime of treatment, and I want to sample from these with different probabilities to test the behavior of different regression approaches with different ratios of regimes observed.)