1

I have a data frame. For argument's sake, let's say it's the datasets::women data frame. I want to create a vector from the frame by applying a function to each row.

It seems that the usual way to do this is to use dplyr and call mutate or transmute, for example:

dplyr::transmute(women, some_index = 2 * height + weight)

Great: that works. But what if I pull out the calculation of some_index into a function which acts on a row:

calc_some_index <- function(woman) {
    2 * woman$height + woman$weight
}

Is there a way I should call mutate/transmute so that it calls this function on each row of its input?

Of course, I can see that I get the right result if I call

dplyr::transmute(women, some_index=calc_some_index(women))

but I believe this is just 'cheating' by subbing the calculated vector in, pre-calculated, to the transmute call. It doesn't work, for instance, if I call:

dplyr::transmute(head(women, n=10), some_index=calc_some_index(women))
Peter
  • 3,619
  • 3
  • 31
  • 37

1 Answers1

2

I think you're incurring in a dimension error.

If I do

library(dplyr)
transmute(head(women, n=10),
          some_index=calc_some_index(head(women,10)))

Then it works (the error in your code complained about sizes differing)

Alternatively, you could use the pipe and it works:

head(women, 10) %>%
   transmute(calc_some_index(.))
PavoDive
  • 6,322
  • 2
  • 29
  • 55
  • Yeah, maybe that wasn't the clearest illustration of the problem. I know that I can just take *whatever* I have in the first position (be it `women` or `head(women, n=10)` and use it as the argument to the function in the second position, but I don't want to have to repeat myself like that. What you've proposed with the pipe is close to what I want, and I know how it's working (i.e. `.` just represents the full frame that gets piped in), but can it be done without the pipe? – Peter Mar 15 '16 at 02:31
  • I think that your suggestion of using the pipe operator `%>%` and calling functions on the `.` variable is the best solution. On digging a bit deeper, I learned that `dplyr` provides the short-form column references (e.g. `height` rather than `foo$height`) by using `eval` with a custom environment. To make it work the way I was thinking, the expression being evaluated in `eval` would need to directly reference its enclosing environment, which I don't think is possible. – Peter Apr 11 '16 at 02:25