2

I'd like to create a function "startswith' to be used within brackets in data.table. It should return a character vector containing the column names that begin with the character provided. For example

DT <- data.table(x=1, y=2, z1=1, z2=2)
# the syntax DT[, startswith("z")] is equivalent to  
DT[, .(z1, z2)]
# returns
   z1 z2
1:  1  2

I'm familiar with grep to search for text expressions, but am having trouble finding a way to refer to the column names of DT from within the brackets. One solution I attempted was to use ls() and the environment associated with DT to list all of the columns in DT, but I haven't found a way to refer to this environment from within the brackets.

The goal is to create a wrapper for grep to be used as a convenience function. I don't want to have to specify the DT from within the brackets.

k13
  • 713
  • 8
  • 17
  • 2
    Maybe the data.table developers can add a `.COLNAMES` object (like `.I`,`.BY`, et al), to be used inside the brackets for things like this. Personally, I would just do this sort of operation in two lines. – Frank Apr 03 '15 at 17:01

2 Answers2

4

Surely there is a more idiomatic approach, but this is what I came up with:

startswith <- function(pattern = "z") {

  re <- paste0("^", pattern)

  call_info <- deparse(sys.calls()[[1]])

  if (grepl("(^.+\\()(.+)(\\)$)",call_info)) {
    this_name <- sub("(^.+\\()(.+)(\\)$)","\\2",call_info)
  } else {
    this_name <- strsplit(call_info,"\\[")[[1]][1]
  }

  this <- copy(get(this_name))
  this_names <- names(this)

  eval.parent(grep(re,this_names))

}

library(data.table)
DT <- data.table(x=1, y=2, z1=1, z2=2)
##
R> DT[,.(z1, z2)]
   z1 z2
1:  1  2
##
R> DT[,startswith(), with=F]
   z1 z2
1:  1  2

I had to add in that if () {} else {} block so that this could be used inside of functions, e.g.

Foo <- function(gt) {
  f <- gt[,startswith(),with=F]
  # {do something interesting with f}
  f
}
##
R> Foo(DT)
   z1 z2
1:  1  2

I think this is an interesting question though - to my knowledge, R doesn't have a concept of something like the this pointer in C++, but it would certainly be useful in situations like this. Essentially, all of my hackery with sys.call, get, etc... was just so I could retrieve the names of the calling object.

nrussell
  • 18,382
  • 4
  • 47
  • 60
  • Nice function. You came up with it on your own? – David Arenburg May 12 '15 at 12:25
  • Why do you use `copy(get(this_name))` and not just `get(this_name)`? – A5C1D2H2I1M1N2O1R2T1 May 12 '15 at 12:41
  • @AnandaMahto I guess it's just a habit I've fallen into with `data.table`s because sometimes I'm not completely sure about whether or not the changes I make to the local object when modifying by reference will propagate to the original `data.table`. Although I'm not doing any modify-by-reference in this case, so you're right in that the use of `copy` is unnecessary here. – nrussell May 12 '15 at 14:18
  • @DavidArenburg Thank you, yes I did. To be honest though I was hoping that someone would have a cleaner solution because I think there are still some issues with this approach. For example, if the enclosing function `Foo` from my example had multiple arguments, I'm not sure that `startswith` would work properly. – nrussell May 12 '15 at 14:23
2

Recent versions of data.table support pattern searching in .SDcols:

DT <- data.table(x=1, y=2, z1=1, z2=2)
DT[, .SD, .SDcols = patterns('^z')]
#    z1 z2
# 1:  1  2
MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
  • 1
    The point is to create a wrapper for grep to be used as a convenience function. I don't want to have to specify the DT from within the brackets. – k13 Apr 03 '15 at 15:58