1) Is it possible to do operations (multiplication, division, addition, subtraction) between unequal-sized data.tables using data.table
or will it have to be done with data.frame
?
The following example is a simplified version of my original posting. In my actual data set, it would be A1:A12, B1:B12, C1:C12, E1:E12, F1:F12, etc. I've added in columns J and K to get close to my original data set and to show that I can not do the following in a matrix.
# Sample Data
library(data.table)
input1a <- data.table(ID = c(37, 45, 900),
A1 = c(1, 2, 3),
A2 = c(43, 320, 390),
B1 = c(-0.94, 2.2, -1.223),
B2 = c(2.32, 4.54, 7.21),
C1 = c(1, 2, 3),
C2 = c(-0.94, 2.2, -1.223),
D = c(43, 320, 390),
J = paste0("measurement_1", 1:3),
K = paste0("type_1", 1:3))
setkey(input1a, ID)
input1a
# ID A1 A2 B1 B2 C1 C2 D J K
# 1: 37 1 43 -0.940 2.32 1 -0.940 43 measurement_11 type_11
# 2: 45 2 320 2.200 4.54 2 2.200 320 measurement_12 type_12
# 3: 900 3 390 -1.223 7.21 3 -1.223 390 measurement_13 type_13
input2a <- data.table(ID = c(37, 45, 900),
E1 = c(23, -0.2, 12),
E2 = c(-0.33, -0.012, -1.342))
setkey(input2a, ID)
input2a
# ID E1 E2
# 1: 37 -0.6135756 -0.330
# 2: 45 -0.0124872 -0.012
# 3: 900 -0.4165049 -1.342
outputa <- 0.00066 * input1a[, c(4:5), with = FALSE] *
input1a[, 8, with = FALSE] * input2a[, c(2:3), with = FALSE] # no keys, but would
# like to keep the keys
# outputa <- 0.00066 * B1:B2 * D * A1:A2 / referring back to the column names
setnames(outputa, 2:3, c("F1", "F2"))
Result using outputa
outputa # using existing code and gives a result with no keys
# F1 F2
# 1: -0.6135756 -0.02172773
# 2: -0.0929280 -0.01150618
# 3: -3.7776024 -2.49055607
In the following code I took outputa, which did not keep the keys, and rewrote outputa as outputause. I would like to have the following question answered so that I can perform the needed operations on the data set while keeping the keys intact.
2) How can the following code be rewritten with x defined for each group of columns? This question stems from Weighted sum of variables by groups with data.table and my trouble trying to replicate any of the answers with my data set.
Each group of columns is defined below:
- A1:A2 (
input1a[, 2:3]
), - B1:B2 (
input1a[, 4:5]
), and - D
input1a[, 8]
In outputause, if input1a[, c(4:5), with = FALSE]
was the only group from input1a, then alone it would be x.
What about when you have more than one group from a single data.table
as is shown below?
outputause <- input1a[, lapply(.SD, function(x) {
0.00066 * input1a[, c(4:5), with = FALSE] * input1a[, 8, with = FALSE] *
input2a[, c(2, 3), with = FALSE]
}), by = key(input1a)] # keeping keys intact
setnames(outputause, 2:3, c("F1", "F2"))
Result using outputause
outputause # using revised code and result includes the keys
# ID F1 F2
# 1: 37 -0.6135756 -0.02172773
# 2: 45 -0.0929280 -0.01150618
# 3: 900 -3.7776024 -2.49055607
UPDATE
input2at <- data.table(t(input2a))
inputs <- data.table(input1a, input2at)
I have transposed input2a
and combined it with input1a
in the data.table inputs
. In this simple example I had 3 rows, but in my actual data set I'll have close to 1300 rows. This is the reason why I've asked question 2).
Thank you.