Update multiple columns by reference in data.table with a single vector

Question

I am trying to update by reference multiple columns of a data.table using the output of a function. As an example I set up a fake function and a fake dt:

exampleDT <- data.table(1:6, 0,0,0,0,0,0,0,0)
exampleDF <- as.data.frame(exampleDT) 
exampleFUN <- function(x){seq(x, x+7)}

expected output:

for(i in 1:NROW(exampleDF)){
exampleDF[i, 2:9] <- exampleFUN(i)
}  

  V1 V2 V3 V4 V5 V6 V7 V8 V9
1  1  1  2  3  4  5  6  7  8
2  2  2  3  4  5  6  7  8  9
3  3  3  4  5  6  7  8  9 10
4  4  4  5  6  7  8  9 10 11
5  5  5  6  7  8  9 10 11 12
6  6  6  7  8  9 10 11 12 13

Note, my DT has million rows, I'd appreciate if you could point out a fast solution.

EDIT1: The function I'm providing is just an example, the true one is more complex. It finds the neighbors cells in a raster, considering the borders. The first column contain the cell ID for which I want the neighbors. Ideally the code would be something like:

for(i in 1:NROW(exampleDT)) {
  set(exampleDT, i=i, j=2:9, value=exampleFUN(exampleDT[i, V1]))
}

EDIT2: The actual function I am planning to use:

neighbourF <- function(CellID, cols, ncell) {

  if(CellID == 1) { # Top right corner
      return(c(NA, NA, NA, CellID + cols , CellID + (cols + 1) , CellID + 1, NA, NA))

  } else if(CellID %in% 2:(cols-1)) { # Top row
      return(c(NA, CellID - 1 , CellID + (cols - 1) , CellID + cols , CellID + (cols + 1) , CellID + 1, NA, NA))

  } else if(CellID == cols) { # Top right corner
      return(c(NA, CellID - 1 , CellID + (cols - 1) , CellID + cols, NA, NA, NA, NA))

  } else if(CellID %in% seq(cols+1, ncell-cols, cols)) { # Left column
      return(c(NA, NA, NA, CellID + cols , CellID + (cols + 1) , CellID + 1 , CellID - (cols - 1) , CellID - cols))

  } else if(CellID == ncell-cols+1) { # Bottom left corner
      return(c(NA, NA, NA, NA, NA, CellID + 1 , CellID - (cols - 1) , CellID - cols))

  } else if(CellID %in% (ncell-cols+2):(ncell-1)) { # Bottom row
      return(c(CellID - (cols + 1) , CellID - 1 , NA, NA, NA, CellID + 1 , CellID - (cols - 1) , CellID - cols))

  } else if(CellID == ncell) { # Bottom right corner
      return(c(CellID - (cols + 1) , CellID - 1 , NA, NA, NA, NA, NA, CellID - cols))  

  } else if(CellID %in% seq(2*cols, ncell-cols, cols)) { # Right column
      return(c(CellID - (cols + 1) , CellID - 1 , CellID + (cols - 1) , CellID + cols , NA, NA, NA, CellID - cols))

  } else {
      return(c(CellID - (cols + 1) , CellID - 1 , CellID + (cols - 1) , CellID + cols , CellID + (cols + 1) , CellID + 1 , CellID - (cols - 1) , CellID - cols))

  }

}

Cheers

It would be relatively straight-forward to write a vectorized implementation of your existing function _(I see @Jaap did so in less time than it took me to type this comment)_, but I suspect your actual use case involves something more complex. If Jaap's answer below doesn't meet your needs, can you provide some more color about what kind of operations you are actually performing? Are the values in each row recursively dependent on the prior rows? — Matt Summersgill, Jul 09 '18 at 15:53
@MattSummersgill, I updated my question better describing the function I am planning to use — Gerald T, Jul 09 '18 at 16:15
Thanks, that provides a much better picture of what you're trying to accomplish. For this specific case, I think using some of the matrix tools in R might serve you best. Does the answer on this question: [Find Neighbouring Elements of a Matrix in R](https://stackoverflow.com/questions/29105175/find-neighbouring-elements-of-a-matrix-in-r) look like it might be a good starting point for your problem? @gregor provided a very in depth answer there. — Matt Summersgill, Jul 09 '18 at 17:40
@MattSummersgill thanks for pointing out that topic. The function I wrote is 1-2 order of magnitude faster than those in the topic (even with 60M cells). My problem is with the successive part of the script where I will need to process each output to check what neighbor has the closest value to the cell I'm analyzing, check if that is reciprocal and if it is merge. — Gerald T, Jul 09 '18 at 17:49

Jaap · Answer 1 · 2018-07-09T15:59:01.377

0

A possible solution:

# option 1
exampleDT[, (2:9) := .SD + outer(.I, 0:7, `+`), .SDcols = 2:9][]

# option 2
exampleDT[, (2:9) := .SD + outer(1:nrow(exampleDT), 0:7, `+`), .SDcols = 2:9][]

which gives:

> exampleDT
   V1 V2 V3 V4 V5 V6 V7 V8 V9
1:  1  1  2  3  4  5  6  7  8
2:  2  2  3  4  5  6  7  8  9
3:  3  3  4  5  6  7  8  9 10
4:  4  4  5  6  7  8  9 10 11
5:  5  5  6  7  8  9 10 11 12
6:  6  6  7  8  9 10 11 12 13

edited Jul 09 '18 at 15:59

answered Jul 09 '18 at 15:51

Jaap

81,064
34
182
193

Hi jaap, the true function I'm using in my scrip is quite complex. I'm editing my question – Gerald T Jul 09 '18 at 15:58
@GeraldT Could you update your queston such that it better reflects the complexity of the function you are using? Now it doesn't which makes it hard to guess what you want to achieve. – Jaap Jul 09 '18 at 16:09

Update multiple columns by reference in data.table with a single vector

1 Answers1