2

I have a data:

df_1 <- data.frame(
  x = replicate(4, runif(30, 20, 100)), 
  y = sample(1:3, 30, replace = TRUE)
)

The follow function work:

library(tidyverse)

df_1 %>% 
  select(-y) %>% 
  rowwise() %>% 
  mutate(var = sum(c(x.1, x.3)))

But, the follows functions (for all variables) dooesn't work:

with .:

df_1 %>% 
  select(-y) %>% 
  rowwise() %>% 
  mutate(var = sum(.))

with select_if:

df_1 %>% 
  select(-y) %>% 
  rowwise() %>% 
  mutate(var = sum(select_if(., is.numeric)))

The both methods return:

Source: local data frame [30 x 5]
Groups: <by row>

# A tibble: 30 x 5
     x.1   x.2   x.3   x.4   var
   <dbl> <dbl> <dbl> <dbl> <dbl>
 1  32.7  42.7  50.1  20.8 7091.
 2  75.9  71.3  83.6  77.6 7091.
 3  49.6  28.7  97.0  59.7 7091.
 4  47.4  96.1  31.9  79.7 7091.
 5  54.2  47.1  81.7  41.6 7091.
 6  27.9  58.1  97.4  25.9 7091.
 7  61.8  78.3  52.6  67.7 7091.
 8  85.4  51.3  38.8  82.0 7091.
 9  27.9  72.6  68.9  25.2 7091.
10  87.2  42.1  27.6  73.9 7091.
# ... with 20 more rows

Where 7091 is a incorrect sum.

How adjustment this functions?

neves
  • 796
  • 2
  • 10
  • 36

4 Answers4

4

This can be done using purrr::pmap, which passes a list of arguments to a function that accepts "dots". Since most functions like mean, sd, etc. work with vectors, you need to pair the call with a domain lifter:

df_1 %>% select(-y) %>% mutate( var = pmap(., lift_vd(mean)) )
#         x.1      x.2      x.3      x.4      var
# 1  70.12072 62.99024 54.00672 86.81358 68.48282
# 2  49.40462 47.00752 21.99248 78.87789 49.32063

df_1 %>% select(-y) %>% mutate( var = pmap(., lift_vd(sd)) )
#         x.1      x.2      x.3      x.4      var
# 1  70.12072 62.99024 54.00672 86.81358 13.88555
# 2  49.40462 47.00752 21.99248 78.87789 23.27958

The function sum accepts dots directly, so you don't need to lift its domain:

df_1 %>% select(-y) %>% mutate( var = pmap(., sum) )
#         x.1      x.2      x.3      x.4      var
# 1  70.12072 62.99024 54.00672 86.81358 273.9313
# 2  49.40462 47.00752 21.99248 78.87789 197.2825

Everything conforms to the standard dplyr data processing, so all three can be combined as separate arguments to mutate:

df_1 %>% select(-y) %>% 
  mutate( v1 = pmap(., lift_vd(mean)),
          v2 = pmap(., lift_vd(sd)),
          v3 = pmap(., sum) )
#         x.1      x.2      x.3      x.4       v1       v2       v3
# 1  70.12072 62.99024 54.00672 86.81358 68.48282 13.88555 273.9313
# 2  49.40462 47.00752 21.99248 78.87789 49.32063 23.27958 197.2825
Artem Sokolov
  • 13,196
  • 4
  • 43
  • 74
  • Thanks. But, and for more than function? Example, for `mean`, `sd` and `var` (3 new columns)? See: `mutate(var = pmap(., lift_vd(mean, sd, var)))` doesn't work. – neves Apr 30 '19 at 19:37
  • 1
    @GiovaniNeves: Just combine those inside `mutate` like you would normally. See the edit above. – Artem Sokolov Apr 30 '19 at 19:40
  • Great solution! Thanks, @Artem Sokolov! – neves Apr 30 '19 at 19:42
2

I think this is tricky because the scoped variants of mutate (mutate_at, mutate_all, mutate_if) are generally aimed at executing a function on a specific column, instead of creating an operation that uses all columns.

The simplest solution I can come up with basically amounts to creating a vector (cols) that is then used to execute the summary operation:

library(dplyr)
library(purrr)

df_1 <- data.frame(
  x = replicate(4, runif(30, 20, 100)), 
  y = sample(1:3, 30, replace = TRUE)
)

# create vector of columns to operate on
cols <- names(df_1)
cols <- cols[map_lgl(df_1, is.numeric)]
cols <- cols[! cols %in% c("y")]

cols
#> [1] "x.1" "x.2" "x.3" "x.4"

df_1 %>% 
  select(-y) %>% 
  rowwise() %>% 
  mutate(
    var = sum(!!!map(cols, as.name), na.rm = TRUE)
  )
#> Source: local data frame [30 x 5]
#> Groups: <by row>
#> 
#> # A tibble: 30 x 5
#>      x.1   x.2   x.3   x.4   var
#>    <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  46.1  28.9  28.9  50.7  155.
#>  2  26.8  68.0  67.1  26.5  188.
#>  3  35.2  63.8  62.5  28.5  190.
#>  4  31.3  44.9  67.3  68.2  212.
#>  5  52.6  23.9  83.2  43.4  203.
#>  6  55.7  92.8  86.3  57.2  292.
#>  7  56.9  50.0  77.6  25.6  210.
#>  8  95.0  82.6  86.1  22.7  286.
#>  9  62.7  26.5  61.0  88.9  239.
#> 10  65.2  23.1  25.5  51.0  165.
#> # … with 20 more rows

Created on 2019-04-30 by the reprex package (v0.2.1)

NOTE: if you are unfamiliar with purrr, you can also use something like lapply, etc.

You can read more about these types of more tricky dplyr operations (!!, !!!, etc.) here:

https://dplyr.tidyverse.org/articles/programming.html

cole
  • 1,737
  • 2
  • 15
  • 21
  • 1
    This is great! The one thing I would add in case some people find it useful is that this works because `sum()` accepts `...` as input. Some functions accept a vector (e.g. `entropy::entropy()`), in which case you can simply use `c()` to wrap the unpacking structure: `some_function(c(!!!map(cols, as.name)), other.args = blah)` – Felipe Gerard Apr 14 '23 at 17:15
2

A few approaches I've taken in the past:

  • use a pre-existing row-wise function (e.g. rowSums)
  • using reduce (which doesn't apply to all functions)
  • complicated transposing
  • custom function with pmap

Using pre-existing row-wise functions

set.seed(1)
df_1 <- data.frame(
  x = replicate(4, runif(30, 20, 100)), 
  y = sample(1:3, 30, replace = TRUE)
)

library(tidyverse)

# rowSums
df_1 %>%
  mutate(var = rowSums(select(., -y))) %>%
  head()
#>        x.1      x.2      x.3      x.4 y      var
#> 1 41.24069 58.56641 93.03007 39.17035 3 232.0075
#> 2 49.76991 67.96527 43.48827 24.71475 2 185.9382
#> 3 65.82827 59.48330 56.72526 71.38306 2 253.4199
#> 4 92.65662 34.89741 46.59157 90.10154 1 264.2471
#> 5 36.13455 86.18987 72.06964 82.31317 3 276.7072
#> 6 91.87117 73.47734 40.64134 83.78471 2 289.7746

Using Reduce

df_1 %>%
  mutate(var = reduce(select(., -y),`+`))  %>%
  head()
#>        x.1      x.2      x.3      x.4 y      var
#> 1 41.24069 58.56641 93.03007 39.17035 3 232.0075
#> 2 49.76991 67.96527 43.48827 24.71475 2 185.9382
#> 3 65.82827 59.48330 56.72526 71.38306 2 253.4199
#> 4 92.65662 34.89741 46.59157 90.10154 1 264.2471
#> 5 36.13455 86.18987 72.06964 82.31317 3 276.7072
#> 6 91.87117 73.47734 40.64134 83.78471 2 289.7746

ugly transposing and matrix / data.frame conversion

df_1 %>%
  mutate(var = select(., -y) %>% as.matrix %>% t %>% as.data.frame %>% map_dbl(var)) %>%
  head()
#>        x.1      x.2      x.3      x.4 y       var
#> 1 41.24069 58.56641 93.03007 39.17035 3 620.95228
#> 2 49.76991 67.96527 43.48827 24.71475 2 318.37221
#> 3 65.82827 59.48330 56.72526 71.38306 2  43.17011
#> 4 92.65662 34.89741 46.59157 90.10154 1 878.50087
#> 5 36.13455 86.18987 72.06964 82.31317 3 520.72241
#> 6 91.87117 73.47734 40.64134 83.78471 2 506.16785

Custom function with pmap

my_var <- function(...){
  vec <-  c(...)
  var(vec)
}

df_1 %>%
  mutate(var = select(., -y) %>% pmap(my_var)) %>%
  head()
#>        x.1      x.2      x.3      x.4 y      var
#> 1 41.24069 58.56641 93.03007 39.17035 3 620.9523
#> 2 49.76991 67.96527 43.48827 24.71475 2 318.3722
#> 3 65.82827 59.48330 56.72526 71.38306 2 43.17011
#> 4 92.65662 34.89741 46.59157 90.10154 1 878.5009
#> 5 36.13455 86.18987 72.06964 82.31317 3 520.7224
#> 6 91.87117 73.47734 40.64134 83.78471 2 506.1679

Created on 2019-04-30 by the reprex package (v0.2.1)

zack
  • 5,205
  • 1
  • 19
  • 25
  • Instead `+`, I can put `mean`, `var` (answer with `reduce`)? How can I do this? – neves Apr 30 '19 at 17:55
  • I've updated it with `var`, using a different strategy. It's not particularly elegant (I'm assuming there's some row-wise custom functions for many things), but this approach would generally work as long as all columns `-y` are of the same type. – zack Apr 30 '19 at 18:33
1

This is a tricky problem since dplyr operates column-wise for many operations. I originally used apply from base R to apply over rows, but apply is problematic when handling character and numeric types.

Instead we can use (the aging) plyr and adply to do this simply, since plyr lets us treat a one-row data frame as a vector:

df_1 %>% select(-y) %>% adply(1, function(df) c(v1 = sd(df[1, ])))

Note some functions like var won't work on a one-row data frame so we need to convert to vector using as.numeric.

qwr
  • 9,525
  • 5
  • 58
  • 102