0

Consider this example:

data = data.frame(
  id = c(1, 2, 3, 4, 5),
  value = runif(5)
)

bar <- function(N) {
  if (length(N) > 1)
    stop("N too long")
  return(N)
}

data = transform(data, foo = bar(value))

My code errors out at stop("N too long"). I expected only one value at a time.

Why does R pass a complete column to N rather than just one value at a time? To me, this is very counterintuitive.

What can I do instead of this, when I want a new column based on a function return, considering that this function may not only have one argument, but more than one? The function does not work with vectors—it needs to be run one row at a time.

Certainly, the solution can't be to do:

data = ddply(data, .(id), function(row) {
  return(transform(foo = bar(row$value)))
})
slhck
  • 36,575
  • 28
  • 148
  • 201
  • 3
    R works with vectors. We are all very happy that the whole vector gets passed in your example, because that ensures best performance (speed). – Roland Mar 27 '15 at 12:33
  • What exactly did you expect it to do? `data$value` is a vector. So `bar(value)` passes `N=value` so `N` is a vector. Did you want to operate on `data$value[1]` then `data$value[2]` and so on? – konvas Mar 27 '15 at 12:34
  • @konvas Exactly that. I absolutely expected it to pass each row's value to the function. At least nowhere it's mentioned that these functions use vectors, and from the simple examples where you never see a function called, you *could* think that it does it on a per-row basis. – slhck Mar 27 '15 at 12:37
  • Recommended reading: [R-intro](http://cran.r-project.org/doc/manuals/R-intro.pdf), [R language definition](http://cran.r-project.org/doc/manuals/R-lang.pdf), [The R Inferno](http://www.burns-stat.com/pages/Tutor/R_inferno.pdf) – Roland Mar 27 '15 at 12:39
  • Eh, okay, the documentation says "The `...` arguments to `transform.data.frame` are tagged vector expressions", but what a "tagged vector expression" is, I don't know either, and you can't look that up. – slhck Mar 27 '15 at 12:39
  • @Roland Thanks, however note that none of these mention `transform` explicitly. What would be the most R-like way to accomplish what I need? – slhck Mar 27 '15 at 12:45
  • Yes in your case, you use `length` which according to the help files it takes a vector as argument... The most R-like way is to use vectors because otherwise R is terribly slow. Why don't you update the question with the specific task that you want to accomplish – konvas Mar 27 '15 at 12:45
  • I just want to evaluate a rather complex function `bar()` on every row of a data frame, to obtain a new column. And `bar` takes multiple arguments from some of the data frame's columns, and it can not be run on vectors. – slhck Mar 27 '15 at 12:48
  • Sorry I meant `length`. If you want to do something rowwise you can use a `for` loop but in most cases this is not the best way to do things, or something like `lapply(1:nrow(data), function(i) bar(data$value[i])` – konvas Mar 27 '15 at 12:50

1 Answers1

0

This is what I finally did:

data$foo = apply(data[, c('value')], 1, function(row) bar(row['value']))

Of course, it's also possible to use more columns with this approach.

See also: For each row in an R dataframe

Community
  • 1
  • 1
slhck
  • 36,575
  • 28
  • 148
  • 201