3

Dropping an element form a list via conventional means (for example ll["name"] <- NULL ), causes the entire list to be copied over. Normally, this is not noticable, until of course the data sets become large.

I have a list with a dozen elements each between 0.25 ~ 2 GB in size. Dropping three elements from this list takes about ten minutes to execute (on a relatively fast machine.)

Is there a way to drop elements from a list in-place?


I have tried the following:

TEST <- list(A=1:20,  B=1:5)

TEST[["B"]] <- NULL
TEST["B"] <- NULL
TEST <- TEST[c(TRUE, FALSE)]
data.table::set(TEST, "B", value=NULL) # ERROR

Output with memory info:

cat("\n\n\nATTEMPT 1\n")
TEST <- list(A=1:20,  B=1:5)
.Internal(inspect(TEST))
TEST[["B"]] <- NULL
.Internal(inspect(TEST))

cat("\n\n\nATTEMPT 2\n")
TEST <- list(A=1:20,  B=1:5)
.Internal(inspect(TEST))
TEST["B"] <- NULL
.Internal(inspect(TEST))

cat("\n\n\nATTEMPT 3\n")
TEST <- list(A=1:20,  B=1:5)
.Internal(inspect(TEST))
TEST <- TEST[c(TRUE, FALSE)]
Ricardo Saporta
  • 54,400
  • 17
  • 144
  • 178
  • 1
    Is this an application for which you could use an environment as a storage container instead of a list? If you do `e <- as.environment(TEST)`, `rm("A", envir=e)` should be much faster than `TEST["A"] <- NULL`. (Untested). – Josh O'Brien Oct 22 '13 at 22:07
  • There's a great question regarding copying/modifying on [**this page**](http://stackoverflow.com/a/16370240/1478381) and I link to this particular answer because it is very illustrative and probably deserves more love! – Simon O'Hanlon Oct 22 '13 at 22:09
  • I think you will need `reflist` as proposed [here](https://r-forge.r-project.org/tracker/?func=detail&atid=978&aid=2351&group_id=240). An approach (similar to `data.table := / set`) that will work in lists. As yet not implemented. – mnel Oct 22 '13 at 22:19

2 Answers2

2

I don't know how you could make a vector shorter without copying it. The next best thing would be to set the element to missing NA or NULL.

According to ?Extract, you have to specify TEST[i] <- list(NULL) to set an element to NULL. And my tests indicate that i must be an integer or logical vector.

> TEST <- list(A=1:20,  B=1:5); .Internal(inspect(TEST))
@27d2c60 19 VECSXP g0c2 [NAM(1),ATT] (len=2, tl=0)
  @27dd9e0 13 INTSXP g0c6 [] (len=20, tl=0) 1,2,3,4,5,...
  @2805c98 13 INTSXP g0c3 [] (len=5, tl=0) 1,2,3,4,5
ATTRIB:
  @1f38be8 02 LISTSXP g0c0 [] 
    TAG: @d3f478 01 SYMSXP g1c0 [MARK,LCK,gp=0x4000] "names" (has value)
    @2807430 16 STRSXP g0c2 [] (len=2, tl=0)
      @dc2628 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "A"
      @dc25f8 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "B"
> TEST[2] <- list(NULL); .Internal(inspect(TEST)); TEST
@27d2c60 19 VECSXP g0c2 [MARK,NAM(1),ATT] (len=2, tl=0)
  @27dd9e0 13 INTSXP g0c6 [MARK] (len=20, tl=0) 1,2,3,4,5,...
  @d3fb78 00 NILSXP g1c0 [MARK,NAM(2)] 
ATTRIB:
  @1f38be8 02 LISTSXP g0c0 [MARK] 
    TAG: @d3f478 01 SYMSXP g1c0 [MARK,LCK,gp=0x4000] "names" (has value)
    @2807430 16 STRSXP g0c2 [MARK] (len=2, tl=0)
      @dc2628 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "A"
      @dc25f8 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "B"
$A
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

$B
NULL
Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
  • 1
    This isn't quite the same though. OP wants to remove an element from a list; this will keep the list the same, but set one of the elements to `NULL`. – Hong Ooi Oct 22 '13 at 22:06
  • @HongOoi: I can't think of a way to change the length of a vector without copying it. – Joshua Ulrich Oct 22 '13 at 22:07
  • True, but internally a list is just a vector of pointers. One would think that you could copy that vector without also copying all the objects that it points to. – Hong Ooi Oct 22 '13 at 22:14
  • 1
    @HongOoi: Yes, you could copy all the pointers to a *new* list, but that's not what they asked. They want to change the length of the list. I don't think you can do that. And I'm not going to write the required C code to move the pointers to a new list. :) – Joshua Ulrich Oct 22 '13 at 22:36
  • Thanks Josh, wrapping `NULL` in `list( )` does the trick. Hong, you are correct in that it isn't fully dropping the element, but it is releasing it from memory (at least after `gc()`) which was my desired outcome. – Ricardo Saporta Oct 23 '13 at 14:29
1

As @JoshO'Brien has suggested in his comment it is much more efficient to use environments instead of lists to store large objects in memory. In my experience, Environments confer significant time and memory advantages (for large object storage):

Element lookup time.

Have you noticed that it can be quite slow (a few seconds) to access an object at the end of your list? That's because lists don't know where each element is in memory, they have to find each element by searching through the list (i think).

Accessing a variable in an environment on the other hand is instantaneous (it only has to search through the list of variable names stored in the environment). This is noticeable when your list elements are large!

In place modification.

When modifying (or removing) variables in an environment, only the individual object is copied. When you modify a list, the whole list is copied in the process.

Working with environments

  1. Defining a new environment: TEST <- new.env()
  2. Casting to an environment: TEST <- as.environment(TEST)
  3. Element deletion: rm(A, envir=TEST)
  4. Element creation: TEST$A <- 1:20
  5. Element access: TEST$A
  6. Listing objects stored: ls(pos=TEST) (This is the equivalent of names(TEST))
Scott Ritchie
  • 10,293
  • 3
  • 28
  • 64
  • Thanks @Manetheran, I make use of environments extensively. I'm dealing with a list object I am importing though and so this isn't an available option to me. (+1 for the helpful answer though) – Ricardo Saporta Oct 23 '13 at 14:31