purrr
approach
We can use a combination of nest
and map_dbl
for this:
library(tidyverse)
df %>%
rowwise %>%
nest(-blob) %>%
mutate(s = map_dbl(data, FUN)) %>%
unnest
Let's break that down a little bit. First, rowwise
allows us to apply each subsequent function to support arbitrary complex operations that need to be applied to each row.
Next, nest
will create a new column that is a list of our data to be fed into FUN
(the beauty of tibbles vs data.frames!). Since we are applying this rowwise
, each row contains a single-row tibble of exclude:D
.
Finally, we use map_dbl
to map our FUN
to each of these tibbles. map_dbl
is used over the family of other map_*
functions since our intended output is numeric (i.e. double).
unnest
returns our tibble into the more standard structure.
purrrlyr
approach
While purrrlyr
may not be as 'popular' as its parents dplyr
and purrr
, its by_row
function has some utility here.
In your above example, we would use your data frame df
and user-defined function FUN
in the following way:
df %>%
by_row(..f = FUN, .to = "s", .collate = "cols")
That's it! Giving you:
# tibble [3 x 6]
exclude B C D blob s
<chr> <dbl> <dbl> <dbl> <chr> <dbl>
1 B 1 3 1 fd 4
2 B 0 4 1 fs 5
3 D 0 9 0 sa 9
Admittedly, the syntax is a little strange, but here's how it breaks down:
..f
= the function to apply to each row
.to
= the name of the output column, in this case s
.collate
= the way the results should be collated, by list, row, or column. Since FUN
only has a single output, we would be fine to use either "cols"
or "rows"
See here for more information on using purrrlyr
...
Performance
Forewarning, while I like the functionality of by_row
, it's not always the best approach for performance! purrr
is more intuitive, but also at a rather large speed loss. See the following microbenchmark
test:
library(microbenchmark)
mbm <- microbenchmark(
purrr.test = df %>% rowwise %>% nest(-blob) %>%
mutate(s = map_dbl(data, FUN)) %>% unnest,
purrrlyr.test = df %>% by_row(..f = FUN, .to = "s", .collate = "cols"),
rowwise.test = df %>%
rowwise %>%
do({
result = as_tibble(.)
result$s = FUN(result)
result
}),
group_by.test = df %>%
group_by(1:n()) %>%
do({
result = as_tibble(.)
result$s = FUN(result)
result
}),
sapply.test = {df$s <- sapply(1:nrow(df), function(x) FUN(df[x,]))},
times = 1000
)
autoplot(mbm)

You can see that the purrrlyr
approach is faster than the approach of using a combination of do
with rowwise
or group_by(1:n())
(see @konvas answer), and rather on par with the sapply
approach. However, the package is admittedly not the most intuitive. The standard purrr
approach seems to be the slowest, but also perhaps easier to work with. Different user-defined functions may change the speed order.