1

I am writing a wrapper to combine any number of datasets row-wise. Since some may have unique variables, I am first restricting to the variables in the data.

My function works like this

rcombine <- function(List, Vars) {
  List2 <- lapply(List, subset, select=Vars)
  Reduce(rbind, List2)
}

When I run the code directly, it works. But in the function, my variable Vars disappears.

For instance:

x <- data.frame('a'=sample(LETTERS, 10), 'b'=sample(LETTERS, 10), 'c'=sample(LETTERS, 10))
y <- data.frame('a'=sample(LETTERS, 10), 'b'=sample(LETTERS, 10), 'e'=sample(LETTERS, 10))

rcombine(list(x, y), c('a', 'b'))

gives me:

Error in eval(expr, envir, enclos) : object 'Vars' not found

but running:

List <- list(x, y)
Reduce(rbind, lapply(List, subset, select=c('a','b')))

Works. I can print Vars from the function, but inside lapply it disappears. What is going on?

AdamO
  • 4,283
  • 1
  • 27
  • 39

1 Answers1

4

subset really shouldn't be used for these types of things. From the help page

This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like [, and in particular the non-standard evaluation of argument subset can have unanticipated consequences.

For your particular problem I don't see why just replacing subset with directly using "[" would be a problem.

rcombine <- function(List, Vars) {
  List2 <- lapply(List, "[", i= , j = Vars, drop = FALSE) # here is the change
  Reduce(rbind, List2)
}

# alternatively...
rcombine <- function(List, Vars) {
  List2 <- lapply(List, function(x){x[, Vars, drop = FALSE]}) # here is the change
  Reduce(rbind, List2)
}

x <- data.frame('a'=sample(LETTERS, 10), 'b'=sample(LETTERS, 10), 'c'=sample(LETTERS, 10))
y <- data.frame('a'=sample(LETTERS, 10), 'b'=sample(LETTERS, 10), 'e'=sample(LETTERS, 10))

rcombine(list(x, y), c('a', 'b'))
Dason
  • 60,663
  • 9
  • 131
  • 148
  • That does fix the problem. A bit sad because I feel like `subset` helps make the code more readable, especially for non-programmers. Looks like the reason why it fails is that the `subset.data.frame` method uses an `eval` statement in it's setup, which seems like a likely culprit. – AdamO Apr 16 '13 at 22:57
  • awesome answer that really highlights the quirks of R! – daikonradish Apr 17 '13 at 01:05
  • 1
    Calling primitive functions (e.g. `[`) like this is a bit risky - I think you're safer to use an anonymous function: `lapply(List, function(j) x[, j, drop = FALSE])`. Also note `drop = FALSE` otherwise the code doesn't work when `Vars` is length one. – hadley Apr 17 '13 at 02:08
  • @ashkan subset is for interactive analysis, not for programming. – hadley Apr 17 '13 at 02:09
  • @hadley Can you elaborate on why you think calling `[` like this is 'risky'? – Dason Apr 17 '13 at 02:38
  • @Dason -- see the *note* section of `?lapply`, especially *This means that it is often safer to call primitive functions with a wrapper, so that e.g. lapply(ll, function(x) is.numeric(x)) is required in R 2.7.1 to ensure that method dispatch for is.numeric occurs correctly.* For an example on SO see http://stackoverflow.com/questions/14928278/why-does-lapply-not-retain-my-data-table-keys – mnel Apr 17 '13 at 03:39
  • 1
    @dason and because primitive functions don't do named matching, so `lapply(List, "[", j = Vars)` doesn't do what you expect. – hadley Apr 17 '13 at 12:37
  • @hadley If `[` doesn't do name based matching, why does Dason's example work? – AdamO Apr 17 '13 at 17:46
  • @ashkan positional matching works as well. You can leave off the i= and the j= in the example and it will still work. The issue Hadley brings up is that even if you did `j=Vars, i=` it wouldn't matter and it would think Vars was specifying rows. I put the i= and j= in there to make it a little more clear what was supposed to happen. – Dason Apr 17 '13 at 17:50
  • @Dason I see, I thought by positional matching, we were referring to the column index for `Vars` in the various `data.frames` in `List`. I can see how that would be confusing. – AdamO Apr 17 '13 at 17:52