1

It is mentioned here that R use copy-on-modify when assigning a variable to a new one, including passing parameter to a function.

However, does slicing (vector, list, data frame) create a new object, of the same type, that contains oopies of the subset of the original object, or is the elements store in the new object a copy of the original one or just a copy-on-modify reference?

JiaHao Xu
  • 2,452
  • 16
  • 31

4 Answers4

2

This is a complex topic. You should start with reading about the NAMED mechanism.

If you run the following, you see that there is no copy of the list elements (because lists are basically pointers to their elements):

> a <- list(1, 2, 3, 4, 5)
> 
> b <- a[1:2]
> .Internal(inspect(b)) 
@0x000000001327e5b8 19 VECSXP g0c2 [NAM(3)] (len=2, tl=0)
  @0x00000000136f6b60 14 REALSXP g0c1 [NAM(3)] (len=1, tl=0) 1
  @0x00000000136f6b28 14 REALSXP g0c1 [NAM(3)] (len=1, tl=0) 2
> 
> 
> c <- a[1:2]
> .Internal(inspect(c)) 
@0x000000001327e678 19 VECSXP g0c2 [NAM(3)] (len=2, tl=0)
  @0x00000000136f6b60 14 REALSXP g0c1 [NAM(3)] (len=1, tl=0) 1
  @0x00000000136f6b28 14 REALSXP g0c1 [NAM(3)] (len=1, tl=0) 2
> 
> b[1] <- 6
> .Internal(inspect(b)) 
@0x000000001327e6f8 19 VECSXP g0c2 [NAM(1)] (len=2, tl=0)
  @0x0000000013745b58 14 REALSXP g0c1 [] (len=1, tl=0) 6
  @0x00000000136f6b28 14 REALSXP g0c1 [NAM(3)] (len=1, tl=0) 2
> 
> .Internal(inspect(c))
@0x000000001327e678 19 VECSXP g0c2 [NAM(3)] (len=2, tl=0)
  @0x00000000136f6b60 14 REALSXP g0c1 [NAM(3)] (len=1, tl=0) 1
  @0x00000000136f6b28 14 REALSXP g0c1 [NAM(3)] (len=1, tl=0) 2

This is different if you subset vectors.

You might also be interested in the new reference counting mechanism.

Roland
  • 127,288
  • 10
  • 191
  • 288
  • I see why `vector` are different from `list`. They store all elements in one memory chunk instead of one by one since their type are known why `vector`s are created and this can minimize fragmentation, improve locality and probably allocate less space since there is only one type to record. – JiaHao Xu Mar 01 '19 at 06:23
  • 1
    It's a bit more complex since R 3.5.0 introduced ALTREP. – Roland Mar 01 '19 at 06:50
  • I just realize that this is called `shallow copy`... I also wonder, can this happens on any other container in `R` (like used-defined one)? – JiaHao Xu Mar 01 '19 at 10:00
  • As long as you implement them in C/C++, you can control stuff like this. The data.table package does that. – Roland Mar 01 '19 at 10:19
  • So containers provided by R packages can actually do this? That;s cool. Thanks so much. – JiaHao Xu Mar 01 '19 at 11:08
1

Subsetting an atomic vector to a shorter one will give you a new vector. Subsetting entire vectors out of objects gives you a copy-on-modify reference. The consequence of this is that you can subset to get a new shorter list object, but it's contents will be references to the contents in the original one (with no overall memory cost) until you modify.

See Hadley's notes on memory management for more detail.

James
  • 65,548
  • 14
  • 155
  • 193
0

Different to maybe python, R creates new objects anytime you slice one. For example:

> a=c(1,2,3,4,5)
> a
[1] 1 2 3 4 5
> b=a[1]
> b
[1] 1
> b=7
> b
[1] 7
> a
[1] 1 2 3 4 5

This works the same for vectors, lists, or dataframes. Take a look a this post for reference-objects in R.

boski
  • 2,437
  • 1
  • 14
  • 30
  • I know that slicing always creates a new one, but is the element that is contained in the new object a copy of the original or just copy-on-modify? – JiaHao Xu Feb 27 '19 at 10:55
  • maybe take a look at https://stackoverflow.com/questions/15759117/what-exactly-is-copy-on-modify-semantics-in-r-and-where-is-the-canonical-source – boski Feb 27 '19 at 10:57
0

When subsetting a vector, R does not perform copy-on-modify , but it does create another object from the one we are subsetting from. For example:

library(lobstr)
> x = rnorm(10)
# tracking object copy:
> tracemem(x)
[1] "<0xcad0728>"
> y = x # shared binding
> ref(x, y)
[1:0xcad0728] <dbl> 

[1:0xcad0728]
# So, x and y points/refers to the same object at 0xcad0728. That is, we have a shared binding to the same object 
# Now let's subsetting:
> y = y[1:5] # no warning from tracemem(). So, no copy-on-modify behavior triggered 
# now let's check the memory addrs:
> obj_addr(x)
[1] "0xcad0728"
> obj_addr(y)
[1] "0xcd1e198"

As we can see, there is no signal from tracemem() function about object copying. However, subsetting did created another object in memory, so the name y does not point to 0xcad0728 anymore, but to a new object at 0xcad0728.