Apply `dplyr::rowwise` in all variables

Question

I have a data:

df_1 <- data.frame(
  x = replicate(4, runif(30, 20, 100)), 
  y = sample(1:3, 30, replace = TRUE)
)

The follow function work:

library(tidyverse)

df_1 %>% 
  select(-y) %>% 
  rowwise() %>% 
  mutate(var = sum(c(x.1, x.3)))

But, the follows functions (for all variables) dooesn't work:

with .:

df_1 %>% 
  select(-y) %>% 
  rowwise() %>% 
  mutate(var = sum(.))

with select_if:

df_1 %>% 
  select(-y) %>% 
  rowwise() %>% 
  mutate(var = sum(select_if(., is.numeric)))

The both methods return:

Source: local data frame [30 x 5]
Groups: <by row>

# A tibble: 30 x 5
     x.1   x.2   x.3   x.4   var
   <dbl> <dbl> <dbl> <dbl> <dbl>
 1  32.7  42.7  50.1  20.8 7091.
 2  75.9  71.3  83.6  77.6 7091.
 3  49.6  28.7  97.0  59.7 7091.
 4  47.4  96.1  31.9  79.7 7091.
 5  54.2  47.1  81.7  41.6 7091.
 6  27.9  58.1  97.4  25.9 7091.
 7  61.8  78.3  52.6  67.7 7091.
 8  85.4  51.3  38.8  82.0 7091.
 9  27.9  72.6  68.9  25.2 7091.
10  87.2  42.1  27.6  73.9 7091.
# ... with 20 more rows

Where 7091 is a incorrect sum.

How adjustment this functions?

You can use `rowSums`: `df_1 %>% mutate(var = rowSums(select(., starts_with('x.'))))` — IceCreamToucan, Apr 30 '19 at 14:25
Does [this](https://stackoverflow.com/questions/49396267/dplyr-rowwise-sum-and-other-functions-like-max) answer your question? — IceCreamToucan, Apr 30 '19 at 14:26
Doesn't work. I need sum for each case (line) for **all** variables. — neves, Apr 30 '19 at 14:30
Can be done using `purrr` package: `df_1 %>% select(-y) %>% mutate(var = pmap(., lift(sum)))` — Artem Sokolov, Apr 30 '19 at 19:19
@GiovaniNeves: It's the same thing for `mean` and `sd`. Just use a different domain lifter: `df_1 %>% select(-y) %>% mutate( var = pmap(., lift_vd(mean)) )` — Artem Sokolov, Apr 30 '19 at 19:28

Artem Sokolov · Accepted Answer · 2019-04-30T19:40:24.960

This can be done using purrr::pmap, which passes a list of arguments to a function that accepts "dots". Since most functions like mean, sd, etc. work with vectors, you need to pair the call with a domain lifter:

df_1 %>% select(-y) %>% mutate( var = pmap(., lift_vd(mean)) )
#         x.1      x.2      x.3      x.4      var
# 1  70.12072 62.99024 54.00672 86.81358 68.48282
# 2  49.40462 47.00752 21.99248 78.87789 49.32063

df_1 %>% select(-y) %>% mutate( var = pmap(., lift_vd(sd)) )
#         x.1      x.2      x.3      x.4      var
# 1  70.12072 62.99024 54.00672 86.81358 13.88555
# 2  49.40462 47.00752 21.99248 78.87789 23.27958

The function sum accepts dots directly, so you don't need to lift its domain:

df_1 %>% select(-y) %>% mutate( var = pmap(., sum) )
#         x.1      x.2      x.3      x.4      var
# 1  70.12072 62.99024 54.00672 86.81358 273.9313
# 2  49.40462 47.00752 21.99248 78.87789 197.2825

Everything conforms to the standard dplyr data processing, so all three can be combined as separate arguments to mutate:

df_1 %>% select(-y) %>% 
  mutate( v1 = pmap(., lift_vd(mean)),
          v2 = pmap(., lift_vd(sd)),
          v3 = pmap(., sum) )
#         x.1      x.2      x.3      x.4       v1       v2       v3
# 1  70.12072 62.99024 54.00672 86.81358 68.48282 13.88555 273.9313
# 2  49.40462 47.00752 21.99248 78.87789 49.32063 23.27958 197.2825

Thanks. But, and for more than function? Example, for `mean`, `sd` and `var` (3 new columns)? See: `mutate(var = pmap(., lift_vd(mean, sd, var)))` doesn't work. — neves, Apr 30 '19 at 19:37
@GiovaniNeves: Just combine those inside `mutate` like you would normally. See the edit above. — Artem Sokolov, Apr 30 '19 at 19:40

score 2 · Answer 2 · answered Apr 30 '19 at 17:21

I think this is tricky because the scoped variants of mutate (mutate_at, mutate_all, mutate_if) are generally aimed at executing a function on a specific column, instead of creating an operation that uses all columns.

The simplest solution I can come up with basically amounts to creating a vector (cols) that is then used to execute the summary operation:

library(dplyr)
library(purrr)

df_1 <- data.frame(
  x = replicate(4, runif(30, 20, 100)), 
  y = sample(1:3, 30, replace = TRUE)
)

# create vector of columns to operate on
cols <- names(df_1)
cols <- cols[map_lgl(df_1, is.numeric)]
cols <- cols[! cols %in% c("y")]

cols
#> [1] "x.1" "x.2" "x.3" "x.4"

df_1 %>% 
  select(-y) %>% 
  rowwise() %>% 
  mutate(
    var = sum(!!!map(cols, as.name), na.rm = TRUE)
  )
#> Source: local data frame [30 x 5]
#> Groups: <by row>
#> 
#> # A tibble: 30 x 5
#>      x.1   x.2   x.3   x.4   var
#>    <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  46.1  28.9  28.9  50.7  155.
#>  2  26.8  68.0  67.1  26.5  188.
#>  3  35.2  63.8  62.5  28.5  190.
#>  4  31.3  44.9  67.3  68.2  212.
#>  5  52.6  23.9  83.2  43.4  203.
#>  6  55.7  92.8  86.3  57.2  292.
#>  7  56.9  50.0  77.6  25.6  210.
#>  8  95.0  82.6  86.1  22.7  286.
#>  9  62.7  26.5  61.0  88.9  239.
#> 10  65.2  23.1  25.5  51.0  165.
#> # … with 20 more rows

^{Created on 2019-04-30 by the reprex package (v0.2.1)}

NOTE: if you are unfamiliar with purrr, you can also use something like lapply, etc.

You can read more about these types of more tricky dplyr operations (!!, !!!, etc.) here:

https://dplyr.tidyverse.org/articles/programming.html

This is great! The one thing I would add in case some people find it useful is that this works because `sum()` accepts `...` as input. Some functions accept a vector (e.g. `entropy::entropy()`), in which case you can simply use `c()` to wrap the unpacking structure: `some_function(c(!!!map(cols, as.name)), other.args = blah)` — Felipe Gerard, Apr 14 '23 at 17:15

zack · Answer 3 · 2019-04-30T18:58:58.027

A few approaches I've taken in the past:

use a pre-existing row-wise function (e.g. rowSums)
using reduce (which doesn't apply to all functions)
complicated transposing
custom function with pmap

Using pre-existing row-wise functions

set.seed(1)
df_1 <- data.frame(
  x = replicate(4, runif(30, 20, 100)), 
  y = sample(1:3, 30, replace = TRUE)
)

library(tidyverse)

# rowSums
df_1 %>%
  mutate(var = rowSums(select(., -y))) %>%
  head()
#>        x.1      x.2      x.3      x.4 y      var
#> 1 41.24069 58.56641 93.03007 39.17035 3 232.0075
#> 2 49.76991 67.96527 43.48827 24.71475 2 185.9382
#> 3 65.82827 59.48330 56.72526 71.38306 2 253.4199
#> 4 92.65662 34.89741 46.59157 90.10154 1 264.2471
#> 5 36.13455 86.18987 72.06964 82.31317 3 276.7072
#> 6 91.87117 73.47734 40.64134 83.78471 2 289.7746

Using Reduce

df_1 %>%
  mutate(var = reduce(select(., -y),`+`))  %>%
  head()
#>        x.1      x.2      x.3      x.4 y      var
#> 1 41.24069 58.56641 93.03007 39.17035 3 232.0075
#> 2 49.76991 67.96527 43.48827 24.71475 2 185.9382
#> 3 65.82827 59.48330 56.72526 71.38306 2 253.4199
#> 4 92.65662 34.89741 46.59157 90.10154 1 264.2471
#> 5 36.13455 86.18987 72.06964 82.31317 3 276.7072
#> 6 91.87117 73.47734 40.64134 83.78471 2 289.7746

ugly transposing and matrix / data.frame conversion

df_1 %>%
  mutate(var = select(., -y) %>% as.matrix %>% t %>% as.data.frame %>% map_dbl(var)) %>%
  head()
#>        x.1      x.2      x.3      x.4 y       var
#> 1 41.24069 58.56641 93.03007 39.17035 3 620.95228
#> 2 49.76991 67.96527 43.48827 24.71475 2 318.37221
#> 3 65.82827 59.48330 56.72526 71.38306 2  43.17011
#> 4 92.65662 34.89741 46.59157 90.10154 1 878.50087
#> 5 36.13455 86.18987 72.06964 82.31317 3 520.72241
#> 6 91.87117 73.47734 40.64134 83.78471 2 506.16785

Custom function with `pmap`

my_var <- function(...){
  vec <-  c(...)
  var(vec)
}

df_1 %>%
  mutate(var = select(., -y) %>% pmap(my_var)) %>%
  head()
#>        x.1      x.2      x.3      x.4 y      var
#> 1 41.24069 58.56641 93.03007 39.17035 3 620.9523
#> 2 49.76991 67.96527 43.48827 24.71475 2 318.3722
#> 3 65.82827 59.48330 56.72526 71.38306 2 43.17011
#> 4 92.65662 34.89741 46.59157 90.10154 1 878.5009
#> 5 36.13455 86.18987 72.06964 82.31317 3 520.7224
#> 6 91.87117 73.47734 40.64134 83.78471 2 506.1679

^{Created on 2019-04-30 by the reprex package (v0.2.1)}

Instead `+`, I can put `mean`, `var` (answer with `reduce`)? How can I do this? — neves, Apr 30 '19 at 17:55
I've updated it with `var`, using a different strategy. It's not particularly elegant (I'm assuming there's some row-wise custom functions for many things), but this approach would generally work as long as all columns `-y` are of the same type. — zack, Apr 30 '19 at 18:33

qwr · Answer 4 · 2019-08-23T02:03:44.630

This is a tricky problem since dplyr operates column-wise for many operations. I originally used apply from base R to apply over rows, but apply is problematic when handling character and numeric types.

Instead we can use (the aging) plyr and adply to do this simply, since plyr lets us treat a one-row data frame as a vector:

df_1 %>% select(-y) %>% adply(1, function(df) c(v1 = sd(df[1, ])))

Note some functions like var won't work on a one-row data frame so we need to convert to vector using as.numeric.

Apply `dplyr::rowwise` in all variables

4 Answers4

Using pre-existing row-wise functions

Using Reduce

ugly transposing and matrix / data.frame conversion

Custom function with pmap

Custom function with `pmap`