3

I'm looking to process columns by criteria like class or common pattern matching via grep.

My first attempt did not work:

require(data.table)
test.table <- data.table(a=1:10,ab=1:10,b=101:110)
##this does not work and hangs on my machine
test.table[,lapply(names(test.table)[grep("a",names(test.table))], get)]

Ricardo Saporta notes in an answer that you can use this construct, but you have to wrap get in a dummy function:

##this works
test.table[,lapply(names(test.table)[grep("a",names(test.table))], function(x) get(x))]

Why do you need the anonymous function?

(The preferred/cleaner method is via .SDcols:)

test.table[,.SD,.SDcols=grep("a",names(test.table))]
test.table[, grep("a", names(test.table), with = FALSE]
Community
  • 1
  • 1
Blue Magister
  • 13,044
  • 5
  • 38
  • 56
  • 1
    `get` is the standard method for converting a character value to a language object. – IRTFM Aug 05 '13 at 18:05
  • 3
    Note: `grep` has a `value=TRUE` option. You could just write: `lapply(grep("a", names(test.table), value=TRUE), get)` – Arun Aug 05 '13 at 20:32

3 Answers3

3

While @Ricardo is correct that it is safer to wrap primitive or functions that rely on method dispatch within an wrapper, here we can avoid this by setting the correct environment for get in which to search. The trick with lapply is to use sys.parent(n) (in this case n = 0 will work) to obtain the appropriate calling environments.

test.table[,lapply(grep('a',names(test.table),value=TRUE), 
                    get, envir = sys.parent(0))]

(More information can be found here Using get inside lapply, inside a function)

Community
  • 1
  • 1
mnel
  • 113,303
  • 27
  • 265
  • 254
2

This is a function of lapply, not really data.table From the lapply documentation:

For historical reasons, the calls created by lapply are unevaluated, and code has been written (e.g. bquote) that relies on this. This means that the recorded call is always of the form FUN(X[[0L]], ...), with 0L replaced by the current integer index. This is not normally a problem, but it can be if FUN uses sys.call or match.call or if it is a primitive function that makes use of the call. This means that it is often safer to call primitive functions with a wrapper, so that e.g. lapply(ll, function(x) is.numeric(x)) is required in R 2.7.1 to ensure that method dispatch for is.numeric occurs correctly.

Update re @Hadley's and @DWin's comments:

EE <- new.env()
EE$var1 <- "I am var1 in EE"
EE$var2 <- "I am var2 in EE"

## Calling get directly
with(EE, lapply(c("var1", "var2"), get))
Error in FUN(c("var1", "var2")[[1L]], ...) : object 'var1' not found

## Calling get via an anonymous function
with(EE, lapply(c("var1", "var2"), function(x) get(x)))
[[1]]
[1] "I am var1 in EE"

[[2]]
[1] "I am var2 in EE"

with(EE, lapply(c("var1", "var2"), rm))
Error in FUN(c("var1", "var2")[[1L]], ...) : 
  ... must contain names or character strings

with(EE, lapply(c("var1", "var2"), function(x) rm(x)))
[[1]]
NULL

[[2]]
NULL

# var1 & var2 have now been removed
EE
<environment: 0x1154d0060>
Ricardo Saporta
  • 54,400
  • 17
  • 144
  • 178
  • But get is not primitive, nor does it use `sys.call` or `match.call`. – hadley Aug 07 '13 at 01:46
  • @hadley, you are correct in that `get` is none of the things explicitly described. Maybe the documentation should read "..or if it is a primitive **or internal** function that makes use of the call"? – Ricardo Saporta Oct 30 '13 at 01:36
  • I think this is really due to a quirk in `data.table`, not `get` or `lapply`. Note that if `test.table` is a data frame, then neither of the OP's code examples work. Ie, it doesn't make a difference if you wrap `get` in an anonymous function. – Hong Ooi Oct 30 '13 at 02:06
  • @Hong Ooi, In so far as the `j` argument in `[.data.table` expects an expression whereas in `[.data.frame`, `j` expects "which elements to extract", then yes, it is a quirk of the syntax. However, as for the OP and "why is a dummy function necessary for `get`", that is a requirement of `lapply` in almost any context (where one is not using the `envir=` argument of `get`). – Ricardo Saporta Oct 30 '13 at 02:09
  • Sure, I'm aware of that issue with primitive functions. It just doesn't seem relevant in this case. Also, in your example, you don't need an anonymous function either: `lapply(c("var1", "var2"), get, EE)` – Hong Ooi Oct 30 '13 at 02:23
  • In neither case is the anonymous function **needed**. Hence the parenthetical remark at the end of my last comment :) – Ricardo Saporta Oct 30 '13 at 02:25
  • @HongOoi, I am not sure how it is not relevant? The updated example has nothing to do with `data.table` yet produces the exact same effect – Ricardo Saporta Oct 30 '13 at 02:26
-1

It's only because data.table evaluates the j() expression (in simpler terms, everything after the first comma in DT[,...]) as an actual expression. So DT[,"Column1"] returns "Column1", just as with(DT, "Column1") returns "Column1". It's in the data table faq.

If you want, you can do:

DT[,names(test.table),with=F]
Señor O
  • 17,049
  • 2
  • 45
  • 47
  • 1
    I don't understand how this answers the question...? – eddi Aug 05 '13 at 18:37
  • The question has nothing to do with `get` or `lapply`. It has to with the fact that a character object evaluates to a character object in `data.table`. – Señor O Aug 05 '13 at 18:48
  • 2
    sorry señor, pero it has a whole lot to do with `get` and `lapply` :) The issue is not about whether `j` is an expression or not. The question is why does `lapply("Column1", function(x) get(x))` find `Column1` but `lapply("Column1", get)` does not. – Ricardo Saporta Aug 05 '13 at 18:48
  • 1
    ^ Yup, I misunderstood the question – Señor O Aug 05 '13 at 18:50