13

I want to subset some rows of a data table. Like this:

# load data
  data("mtcars")

# convert to data table
  setDT(mtcars,keep.rownames = T)

# Subset data
  mtcars <- mtcars[like(rn,"Mer"),] # or
  mtcars <- mtcars[mpg > 20,]

However, I'm working with a huge data set and I wanted to avoid using <-, which is not memory efficient because it makes a copy of the data.

Is this correct? Is it possible to update the filtered data without <- ?

Henrik
  • 65,555
  • 14
  • 143
  • 159
rafa.pereira
  • 13,251
  • 6
  • 71
  • 109
  • how can you update data without assigning it to a variable? in the end after all your processing the changes has to be assigned to a variable. – Dhawal Kapil Oct 01 '15 at 08:36
  • Why do you want to subset the data, if you don't want to store it? Do you need it only temporarily? Or do you only need the subset and want to drop the original, and you are looking for an efficient way to do this? – RHA Oct 01 '15 at 08:49
  • I think you are asking for the impossible. This could be an interesting FR on GH though. But I believe that for such a thing to be implemented it will require a *lot* of development. – David Arenburg Oct 01 '15 at 08:54
  • 1
    Arguably a dupe of http://stackoverflow.com/q/10790204/1191259 – Frank Oct 01 '15 at 14:04
  • 1
    @DavidArenburg Just delete by reference is not that hard actually. The benefit would be mainly memory efficiency rather than speed so much. – Matt Dowle Oct 01 '15 at 17:20

1 Answers1

10

What you are asking would be delete rows by reference.

It is not yet possible, but there is FR for that #635.

Until then you need to copy (in-memory) your data.table subset, the copy is done by <- (or =) when is combined with subset (i arg) so for now you cannot avoid that.

If it will help somehow you can operate on language objects to predefine the operation and delay it's evaluation, also reuse predefined objects multiple times:

mtcars_sub <- quote(mtcars[like(rn,"Mer")])
mtcars_sub2 <- quote(eval(mtcars_sub)[mpg > 20])
eval(mtcars_sub2)
#           rn  mpg cyl  disp hp drat   wt qsec vs am gear carb
# 1: Merc 240D 24.4   4 146.7 62 3.69 3.19 20.0  1  0    4    2
# 2:  Merc 230 22.8   4 140.8 95 3.92 3.15 22.9  1  0    4    2

BTW. when subsetting data.table you don't need to use middle comma like dt[x==1,] you can use dt[x==1].

jangorecki
  • 16,384
  • 4
  • 79
  • 160