-1

1) Is it possible to do operations (multiplication, division, addition, subtraction) between unequal-sized data.tables using data.table or will it have to be done with data.frame?

The following example is a simplified version of my original posting. In my actual data set, it would be A1:A12, B1:B12, C1:C12, E1:E12, F1:F12, etc. I've added in columns J and K to get close to my original data set and to show that I can not do the following in a matrix.

# Sample Data
library(data.table)
input1a <- data.table(ID = c(37, 45, 900), 
              A1 = c(1, 2, 3), 
              A2 = c(43, 320, 390), 
              B1 = c(-0.94, 2.2, -1.223), 
              B2 = c(2.32, 4.54, 7.21), 
              C1 = c(1, 2, 3), 
              C2 = c(-0.94, 2.2, -1.223), 
              D = c(43, 320, 390), 
              J = paste0("measurement_1", 1:3), 
              K = paste0("type_1", 1:3))
setkey(input1a, ID)
input1a
#      ID  A1  A2     B1   B2  C1     C2   D              J       K
#  1:  37   1  43 -0.940 2.32   1 -0.940  43 measurement_11 type_11
#  2:  45   2 320  2.200 4.54   2  2.200 320 measurement_12 type_12
#  3: 900   3 390 -1.223 7.21   3 -1.223 390 measurement_13 type_13

input2a <- data.table(ID = c(37, 45, 900), 
                      E1 = c(23, -0.2, 12), 
                      E2 = c(-0.33, -0.012, -1.342))
setkey(input2a, ID)
input2a
#     ID         E1     E2
# 1:  37 -0.6135756 -0.330
# 2:  45 -0.0124872 -0.012
# 3: 900 -0.4165049 -1.342

outputa <- 0.00066 * input1a[, c(4:5), with = FALSE] *   
input1a[, 8, with = FALSE] * input2a[, c(2:3), with = FALSE] # no keys, but would 
# like to keep the keys
# outputa <- 0.00066 * B1:B2 * D * A1:A2 / referring back to the column names
setnames(outputa, 2:3, c("F1", "F2"))

Result using outputa

outputa # using existing code and gives a result with no keys
#            F1             F2
# 1: -0.6135756    -0.02172773
# 2: -0.0929280    -0.01150618
# 3: -3.7776024    -2.49055607

In the following code I took outputa, which did not keep the keys, and rewrote outputa as outputause. I would like to have the following question answered so that I can perform the needed operations on the data set while keeping the keys intact.

2) How can the following code be rewritten with x defined for each group of columns? This question stems from Weighted sum of variables by groups with data.table and my trouble trying to replicate any of the answers with my data set.

Each group of columns is defined below:

  • A1:A2 (input1a[, 2:3]),
  • B1:B2 (input1a[, 4:5]), and
  • D input1a[, 8]

In outputause, if input1a[, c(4:5), with = FALSE] was the only group from input1a, then alone it would be x.

What about when you have more than one group from a single data.table as is shown below?

outputause <- input1a[, lapply(.SD, function(x) {
    0.00066 * input1a[, c(4:5), with = FALSE] * input1a[, 8, with = FALSE] * 
      input2a[, c(2, 3), with = FALSE]
  }), by = key(input1a)] # keeping keys intact
setnames(outputause, 2:3, c("F1", "F2"))

Result using outputause

outputause # using revised code and result includes the keys
#    ID             F1               F2
# 1: 37    -0.6135756       -0.02172773
# 2: 45    -0.0929280       -0.01150618
# 3: 900   -3.7776024       -2.49055607

UPDATE

input2at <- data.table(t(input2a))
inputs <- data.table(input1a, input2at)

I have transposed input2a and combined it with input1a in the data.table inputs. In this simple example I had 3 rows, but in my actual data set I'll have close to 1300 rows. This is the reason why I've asked question 2).

Thank you.

Community
  • 1
  • 1
iembry
  • 962
  • 1
  • 7
  • 23
  • 2
    I'm not sure what you are trying to do there, but why are you using `data.table` for it? What you show would be better done with matrices. PS: Please put some line breaks in your code. Horizontal scrolling makes it very difficult to read. – Roland Jul 24 '14 at 17:35
  • Perhaps you could illustrate your problem with a `2x3` table and a `2x2` table? And include your expected output? Right now things are very unclear. – Gregor Thomas Jul 24 '14 at 20:42
  • @Roland Thank you for your comments and code fixes. I've added 2 columns of text to indicate why I'm using `data.table` rather than a `matrix`. Output in the example represents the most complicated equation that I'm solving using a variety of coefficients, monthly data (the set of 12 columns) for about 1300 sites, and location specific data. For each equation solved (around 15), I need to keep the key which is an identifier for each site. – iembry Jul 24 '14 at 20:46
  • imho, your first order of business should be getting rid of the duplicated column names. Then you should possibly reshape to long format and definitely join the data.tables. – Roland Jul 25 '14 at 06:29
  • Your question is too convoluted. The intiial simple question is fine, but then you go onto include a huge code dump not entirely related to your specific question. – Joe Jul 25 '14 at 17:31
  • 3
    I appreciate that you're trying to simplify, but "how can I rewrite this code [8 lines of dense, uncommented code]" is not a good question. Please try to make a **minimal** example, with just one or two operations, that you can try to generalize on your own. And don't just add it at the bottom, edit your question removing all the unnecessary details--it's far too long to be approachable at this point. – Gregor Thomas Jul 25 '14 at 18:07
  • Thank you for your comments. I have made revisions to the original statement and questions. – iembry Jul 30 '14 at 18:40
  • @iembry This is getting there, but please describe **in words** what you're doing. What do you mean by "group of columns"? You've provided your input data, could you show your desired output? – Gregor Thomas Jul 30 '14 at 18:48
  • @Gregor I've revised the posting. Thank you. – iembry Jul 31 '14 at 21:01

1 Answers1

0

I am answering my own question based on an answer provided to me in R data.table operations with multiple groups in single data.table and outside function with lapply.

outputa <- data.table(input1a, input2a)
setnames(outputa, 8, "D1")
outputa[, D2 := D1]

fun <- function(B, D, E) 0.00066 * B * D * E

outputa[, lapply(1:2, function(i) fun(get(paste0('B', i)),
                                  get(paste0('D', i)),
                                  get(paste0('E', i)))),
      by = ID]
Community
  • 1
  • 1
iembry
  • 962
  • 1
  • 7
  • 23