18

Can I use sub-assign by reference on atomic vectors somehow?
Of course without wrapping it in 1 column data.table to use :=.

library(data.table)
N <- 5e7
x <- sample(letters, N, TRUE)
X <- data.table(x = x)
upd_i <- sample(N, 1L, FALSE)
system.time(x[upd_i] <- NA_character_)
#    user  system elapsed 
#    0.11    0.06    0.17 
system.time(X[upd_i, x := NA_character_])
#    user  system elapsed 
#    0.00    0.00    0.03 

If R6 can help on that I'm open for R6 solution as it is one of my dep already.
I've already checked that <- inside R6 object still makes copy: gist.

user227710
  • 3,164
  • 18
  • 35
jangorecki
  • 16,384
  • 4
  • 79
  • 160
  • Interesting. never heard of R6. Looks exciting. – David Arenburg May 20 '15 at 08:52
  • 1
    @DavidArenburg R6 is IMO the best reference class and OOP tool in R. Definitely worth to learn and easy to learn. – jangorecki May 20 '15 at 09:19
  • I'm guessing you can modify by reference with Rcpp. Google yielded: http://stackoverflow.com/q/11300048/1191259 Ari's answer in question linked there has a simple function for it that outperforms the rest. – Frank May 20 '15 at 14:13

2 Answers2

9

In most recent R versions (3.1-3.1.2+ or so), assignment to a vector does not copy. You will not see that by running OP's code though, and the reason for that is the following. Because you reuse x and assign it to some other object, R is not notified that x is copied at that point, and has to assume that it won't be (in the particular case above, I think it'll be good to change it in data.table::data.table and notify R that a copy has been made, but that's a separate issue - data.frame suffers from same issue), and because of that it copies x on first use. If you change the order of the commands a bit, you'd see no difference:

N <- 5e7
x <- sample(letters, N, TRUE)
upd_i <- sample(N, 1L, FALSE)
# no copy here:
system.time(x[upd_i] <- NA_character_)
#   user  system elapsed 
#      0       0       0 
X <- data.table(x = x)
system.time(X[upd_i, x := NA_character_])
#   user  system elapsed 
#      0       0       0 

# but now R will copy:
system.time(x[upd_i] <- NA_character_)
#   user  system elapsed 
#   0.28    0.08    0.36 

(old answer, mostly left as a curiosity)

You actually can use the data.table := operator to modify your vector in place (I think you need R version 3.1+ to avoid the copy in list):

modify.vector = function (v, idx, value) setDT(list(v))[idx, V1 := value]

v = 1:5
address(v)
#[1] "000000002CC7AC48"

modify.vector(v, 4, 10)
v
#[1]  1  2  3 10  5

address(v)
#[1] "000000002CC7AC48"
eddi
  • 49,088
  • 6
  • 104
  • 155
  • Cannot agree more on the solution, I benchmark it and it is super fast. @eddi I think it is a good feature for PR. – jangorecki Jul 24 '15 at 22:16
  • Hmm, I'm not sure why this works... I.e., if you do `res <- setDT(list(v))[4, V1 := 10]` you will get back a `data.table` rather a vector. I've asked once a [similar question](http://stackoverflow.com/questions/24426164/why-doesnt-setdt-have-any-effect-in-this-case) and was told by R-devs that this shouldn't work – David Arenburg Jul 26 '15 at 13:01
  • @DavidArenburg the reason why it works is that neither `list` nor `setDT` copy the underlying data. They do create new objects though that are wrapping the data, so that `res` is new object, but which contains the original vector inside. – eddi Jul 26 '15 at 15:10
  • 2
    @jangorecki hmm, while this answer probably only works for R 3.1+, it's actually outdated for R 3.1+, as I'm pretty sure no extra copies are made for regular vector assignment now, i.e. `v[4] = 20` does not make any unnecessary copies – eddi Aug 04 '15 at 21:23
5

As suggested by @Frank, it's possible to do this using Rcpp. Here's a version including a macro inspired by Rcpp's dispatch.h which handles all atomic vector types:

mod_vector.cpp

#include <Rcpp.h>
using namespace Rcpp;

template <int RTYPE>
Vector<RTYPE> mod_vector_impl(Vector<RTYPE> x, IntegerVector i, Vector<RTYPE> value) {
  if (i.size() != value.size()) {
    stop("i and value must have same length.");
  }
  for (int a = 0; a < i.size(); a++) {
    x[i[a] - 1] = value[a];
  }
  return x;
}

#define __MV_HANDLE_CASE__(__RTYPE__) case __RTYPE__ : return mod_vector_impl(Vector<__RTYPE__>(x), i, Vector<__RTYPE__>(value));

// [[Rcpp::export]]
SEXP mod_vector(SEXP x, IntegerVector i, SEXP value) {
  switch(TYPEOF(x)) {
    __MV_HANDLE_CASE__(INTSXP)
    __MV_HANDLE_CASE__(REALSXP)
    __MV_HANDLE_CASE__(RAWSXP)
    __MV_HANDLE_CASE__(LGLSXP)
    __MV_HANDLE_CASE__(CPLXSXP)
    __MV_HANDLE_CASE__(STRSXP)
    __MV_HANDLE_CASE__(VECSXP)
    __MV_HANDLE_CASE__(EXPRSXP)
  }
  stop("Not supported.");
  return x;
}

Example:

x <- 1:20
address(x)
#[1] "0x564e7e8"
mod_vector(x, 4:5, 12:13)
# [1]  1  2  3 12 13  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
address(x)
#[1] "0x564e7e8"

Comparison with base and data.table methods. It can be seen it's a lot faster:

x <- 1:2e7
microbenchmark::microbenchmark(mod_vector(x, 4:5, 12:13), x[4:5] <- 12:13, modify.vector(x, 4:5, 12:13))
#Unit: microseconds
#                         expr     min       lq        mean    median         uq
#    mod_vector(x, 4:5, 12:13)   5.967   7.3480    15.05259     9.718    21.0135
#              x[4:5] <- 12:13   2.953   5.3610 45722.61334 48122.996 52623.1505
# modify.vector(x, 4:5, 12:13) 954.577 988.7785  1177.17925  1021.380  1361.1210
#        max neval
#     58.463   100
# 126978.146   100
#   1559.985   100
Nick Kennedy
  • 12,510
  • 2
  • 30
  • 52