I have a list of data tables that looks like like this:
group1 <- data.table(
group = rep(x = c("group1"), each = 16),
amount = rep(x = 7:4, each = 4),
subgr = rep(x = 1:2, each = 8),
ind = rep(x = 0:1, each = 4, times = 2)
)
group2 <- data.table(
group = rep(x = c("group2"), each = 36),
amount = rep(x = 13:8, each = 6),
subgr = rep(x = 1:3, each = 12),
ind = rep(x = 0:1, each = 6, times = 3)
)
mydt <- rbind(group1, group2)
mydt <- lapply(X = split(x = 1:nrow(mydt), f = mydt[["group"]]),
FUN = function(i)mydt[i])
The object presented above is oversimplified, the actual list contains much more and much bigger data.table
s, each with slightly different structure in terms of number of rows distributed across subgr
and number of subgr
themselves. What I want to achieve is:
- Create multiple columns in each
data.table
in the list that is equal to the number of unique values insubgr
. Each new column is a copy of theamount
. The number of copied columns will be equal to the number of unique values insubgr
. - Modify the newly created columns within each
subgr
(sayamount*2
ifind == 1
andamount*4
ifind ==0
), leaving the rest of the values in the subgroups insubgr
unaffected.
That is, to have something like this (only mydt$group1
shown here, but it applies for all tables):
$group1
group amount subgr ind am1 am2
1: group1 7 1 0 28 7
2: group1 7 1 0 28 7
3: group1 7 1 0 28 7
4: group1 7 1 0 28 7
5: group1 6 1 1 12 6
6: group1 6 1 1 12 6
7: group1 6 1 1 12 6
8: group1 6 1 1 12 6
9: group1 5 2 0 5 20
10: group1 5 2 0 5 20
11: group1 5 2 0 5 20
12: group1 5 2 0 5 20
13: group1 4 2 1 4 8
14: group1 4 2 1 4 8
15: group1 4 2 1 4 8
16: group1 4 2 1 4 8
I know that splitting a data.table
into list of data.table
s is not a good idea, as mentioned in this postbut this is how the object is. Besides that, the split is related with the task I need to perform:
- The data tables contain different number of rows.
- The rows are grouped into subgroups defined by
subgr
and their number also differs across the different data tables, i.e. the number of new columns will differ across the entire list.
That is, the entire data.table
can't be processed at once because different number of columns will be created for each group in the group
variable.
What I tried so far is writing a function using the second solution in the accepted answer from this post:
myfun <- function(data, quantity, region, index) {
data <- lapply(data, function(i) {
i[ , eval(paste0("am", unique(i[[region]]))) := i[[quantity]]]
})
data <- lapply(X = data, FUN = function(i) {
rep.names <- paste0("am", unique(i[[region]]))
i[ , eval(rep.names) := lapply(.SD, function(j) {
ifelse(i[["ind"]] == 1L, j*2L, j*4L)
}), by = region, .SDcols = rep.names]
})
return(data)
}
myfun(mydt, quantity = "amount", region = "subgr", index = "ind")
It does not work as intended, it modifies the whole range of values within all variables according to the condition. It throws warnings, though, which point the problem. Here is just the first warning, the other are the same:
Warning messages:
1: In `[.data.table`(i, , `:=`(eval(rep.names), lapply(.SD, ... :
RHS 1 is length 16 (greater than the size (8) of group 1). The last
8 element(s) will be discarded.
That is, it uses just the rows it has to on the LHS, but then takes the entire column for the RHS. Apparently I am missing something important here. The difference with the second solution from the accepted answer from [this post][3] is that there are multiple columns to use, while in my case is just one (amount
).
Can someone help?