R dplyr rowwise mean or min and other methods?

Question

How can I get with dplyr the minimum (or mean) value of each row on a data.frame? I mean the same result as

apply(mydataframe, 1, mean) 
apply(mydataframe, 1, min)

I've tried

mydataframe %>% rowwise() %>% mean

or

mydataframe %>% rowwise() %>% summarise(mean)

or other combinations but I always get errors, I don't know the proper way.

I know that I could also use rowMeans, but there is no simple "rowMin" equivalent. There also exist a matrixStats package but most functions don't accept data.frames, only matrixes.

If I want to calculate the min rowwise I could use
do.call(pmin, mydataframe) Is there anything simple like this for the rowwise mean?

do.call(mean, mydataframe)

doesn't work, I guess I need a pmean function or something more complex.

Thanks

In order to compare the results we could all work on the same example:

set.seed(124)
df <- data.frame(A=rnorm(10), B=rnorm(10), C=rnorm(10))

Use `mutate` instead of `summarise`, by the way `do.call(pmin, mydataframe)` *is* a row wise `mean`- try, `do.call(pmin, mtcars[c("gear", "carb")])` for example, so not sure what's your issue with it — David Arenburg, Jul 23 '15 at 22:12
Could you write the full sentence, please? And how do you include options for "mean", for example na.rm=TRUE — skan, Jul 23 '15 at 22:13
For example, (for the `mtcars` datra): `mtcars %>% rowwise() %>% do(data.frame(., res = mean(unlist(.), na.rm = TRUE)))` — David Arenburg, Jul 23 '15 at 22:28
The time cost of `as.matrix` to use `matrixStats` would be pretty low. Also, something like `mtcars[cbind(1:nrow(mtcars),max.col(-mtcars))]` works to find a minimum in each row. — thelatemail, Jul 24 '15 at 01:11
@skan you could also simply use `rowMeans` like this `mtcars$mymean = rowMeans(mtcars)` — Veerendra Gadekar, Jul 24 '15 at 11:07
yes, but why not calculate the min.col? It could also be related with selection of columns or something else. — skan, Jul 25 '15 at 09:55

score 17 · Answer 1 · answered Jul 24 '15 at 03:01

17

I suppose this is what you were trying to accomplish:

df <- data.frame(A=rnorm(10), B=rnorm(10), C=rnorm(10))

library(dplyr)
df %>% rowwise() %>% mutate(Min = min(A, B, C), Mean = mean(c(A, B, C)))

#             A          B           C        Min        Mean
# 1   1.3720142  0.2156418  0.61260582  0.2156418  0.73342060
# 2  -1.4265665 -0.2090585 -0.05978302 -1.4265665 -0.56513600
# 3   0.6801410  1.5695065 -2.70446924 -2.7044692 -0.15160724
# 4   0.0335067  0.8367425 -0.83621791 -0.8362179  0.01134377
# 5  -0.2068252 -0.2305140  0.23764322 -0.2305140 -0.06656532
# 6  -0.3571095 -0.8776854 -0.80199141 -0.8776854 -0.67892877
# 7   1.0667424 -0.6376245 -0.41189564 -0.6376245  0.00574078
# 8  -1.0003376 -1.5985281  0.90406055 -1.5985281 -0.56493504
# 9  -0.8218494  1.1100531 -1.12477401 -1.1247740 -0.27885677
# 10  0.7868666  0.6099156 -0.58994138 -0.5899414  0.26894694

answered Jul 24 '15 at 03:01

Molx

6,816
2
31
47

8

How do you avoid to specify every column name?. Sometimes you have too many. – skan Jul 24 '15 at 10:46
@skan I thought that `df %>% rowwise() %>% mutate_(Mean = min(names(df)))` should work, but it doesn't, only the first column is considered. And for `mean` it gives out an error. No idea why, – Molx Jul 24 '15 at 14:00
2

if anybody can fix it that would be the nicer solution – skan Jul 24 '15 at 16:22
@Molx3 I've seen that the problem is like "df %>% rowwise() %>% mutate_(Min = min(c("A", "B", "C")))" vs "df %>% rowwise() %>% mutate(Min = min(c(A, B, C)))" The second one works, I guess we need that the names function return Non-String values. – skan Jul 24 '15 at 17:03
4

@skan If you need to use the names quoted, you have to use the dplyr functions ending with `_`, like `mutate_`, `summarise_ `, etc. – Molx Jul 24 '15 at 21:37
`mutate` expects an `expression(c(A,B,C,...))` so one way to avoid specifying every column name is by using `eval(parse())` like this: `df %>% rowwise() %>% mutate(argmin = which.min(eval(parse(text = sprintf("c(%s)", paste(names(.), collapse = ","))))))` . The suggestion to use `mutate_` does not work for `which.min`. Example: `df %>% rowwise() %>% mutate_(argmin = which.min(c("A","B","C")))`. – rapture Nov 22 '22 at 12:17

cmaher · Answer 2 · 2018-05-06T14:50:57.600

There seems to be talk that some dplyr functions like rowwise could be deprecated in the long term (such rumblings on display here). Instead, certain functions from the map family of functions -- such as the pmap function -- from purrr can be used to perform this sort of calculation:

library(tidyverse)

df %>% mutate(Min = pmap(df, min), Mean = rowMeans(.))

#              A          B           C        Min       Mean
# 1  -1.38507062  0.3183367 -1.10363778  -1.385071 -0.7234572
# 2   0.03832318 -1.4237989  0.44418506  -1.423799 -0.3137635
# 3  -0.76303016 -0.4050909 -0.20495061 -0.7630302 -0.4576905
# 4   0.21230614  0.9953866  1.67563243  0.2123061  0.9611084
# 5   1.42553797  0.9588178 -0.13132225 -0.1313222  0.7510112
# 6   0.74447982  0.9180879 -0.19988298  -0.199883  0.4875616
# 7   0.70022940 -0.1509696  0.05491242 -0.1509696  0.2013907
# 8  -0.22935461 -1.2230688 -0.68216549  -1.223069 -0.7115296
# 9   0.19709386 -0.8688243 -0.72770415 -0.8688243 -0.4664782
# 10  1.20715377 -1.0424854 -0.86190429  -1.042485 -0.2324120

Mean is a special case (hence the use of the base function rowMeans), since mean on data.frame objects was deprecated with R 3.0.

For mean in pmap, can also use `pmap_dbl(., ~mean(c(...)))` per https://stackoverflow.com/a/50240617/3217870 — Paul, Jan 22 '19 at 21:55

score 6 · Answer 3 · answered Jan 12 '21 at 01:53

With dplyr 1.0.0 you could use rowwise with c_across :

library(dplyr)

df %>%
  rowwise() %>%
  mutate(Min = min(c_across(A:C)), 
          Mean = mean(c_across(A:C)))

#       A      B       C    Min   Mean
#     <dbl>  <dbl>   <dbl>  <dbl>  <dbl>
# 1 -1.39    0.318 -1.10   -1.39  -0.723
# 2  0.0383 -1.42   0.444  -1.42  -0.314
# 3 -0.763  -0.405 -0.205  -0.763 -0.458
# 4  0.212   0.995  1.68    0.212  0.961
# 5  1.43    0.959 -0.131  -0.131  0.751
# 6  0.744   0.918 -0.200  -0.200  0.488
# 7  0.700  -0.151  0.0549 -0.151  0.201
# 8 -0.229  -1.22  -0.682  -1.22  -0.712
# 9  0.197  -0.869 -0.728  -0.869 -0.466
#10  1.21   -1.04  -0.862  -1.04  -0.232

JasonAizkalns · Answer 4 · 2017-10-30T13:20:47.933

5

How about this?

library(dplyr)
as.data.frame(t(mtcars)) %>%
  summarise_all(funs(mean))

For extra clarity, you could add another t() at the end:

as.data.frame(t(mtcars)) %>%
  summarise_all(funs(mean)) %>%
  t()

edited Oct 30 '17 at 13:20

answered Jul 23 '15 at 22:51

JasonAizkalns

20,243
8
57
116

2

Could you get something as easy but introducing the resulting column on the original data.frame? – skan Jul 24 '15 at 10:51
This calculates the mean for each _column_, not _row_! So it doesn't answer the original question. And btw, `summarise_each` is meanwhile deprecated in favor of the more specific `summarise_all` and `summarise_at`. – Salim B Oct 29 '17 at 23:47
1

@SalimB are you sure? I have a transpose `t()` in there that ensures we are summarizing rows not columns. I will update to the non-deprecated version of `summarise_x` – JasonAizkalns Oct 30 '17 at 13:20
1

@JasonAizkalns Ah, now I see! Sorry, my bad, didn't notice the `t()`. Your answer is correct, of course. The additional `t()` at the end you've added improves clarity a lot :) – Salim B Oct 30 '17 at 17:10

tmfmnk · Answer 5 · 2022-12-19T15:33:41.620

One dplyr and purrr option where the select helpers could be used:

df %>%
 mutate(Min = select(., everything()) %>% reduce(pmin),
        Max = select(., everything()) %>% reduce(pmax))

             A          B           C        Min        Max
1  -1.38507062  0.3183367 -1.10363778 -1.3850706  0.3183367
2   0.03832318 -1.4237989  0.44418506 -1.4237989  0.4441851
3  -0.76303016 -0.4050909 -0.20495061 -0.7630302 -0.2049506
4   0.21230614  0.9953866  1.67563243  0.2123061  1.6756324
5   1.42553797  0.9588178 -0.13132225 -0.1313222  1.4255380
6   0.74447982  0.9180879 -0.19988298 -0.1998830  0.9180879
7   0.70022940 -0.1509696  0.05491242 -0.1509696  0.7002294
8  -0.22935461 -1.2230688 -0.68216549 -1.2230688 -0.2293546
9   0.19709386 -0.8688243 -0.72770415 -0.8688243  0.1970939
10  1.20715377 -1.0424854 -0.86190429 -1.0424854  1.2071538

Or since dplyr 1.0.0:

df %>%
 mutate(Min = reduce(across(everything()), pmin),
        Max = reduce(across(everything()), pmax))

This was nice because I could use `ends_with` to calculate the min among only a few columns. `df %>% mutate(mindate = select(., ends_with("date")) %>% reduce(pmin, na.rm=T))` — Dannid, Jul 11 '20 at 00:03

score 0 · Answer 6 · answered Aug 15 '17 at 14:02

Think found a solution - just transpose your data.frame:

x <- data_frame(x = rnorm(10), 
            y = rnorm(10))

# A tibble: 10 × 2
        x             y
    <dbl>         <dbl>
1  -1.1240392  0.9306028477
2  -0.8213379  0.2500495105
3  -0.8289104 -0.3693704483
4  -0.6486601 -1.1421141986
5   0.5098542 -0.3703368343
6  -0.3644690 -0.0003744377
7   0.7404057  0.1166905738
8  -0.2475214 -0.0802864865
9   0.2637841 -0.7717699521
10  1.4092874  0.2998021578

x %>% 
  t() %>% 
  data.frame() %>% 
  mutate_all(funs(min)) %>% 
  unique() %>% 
  t()

         1
X1  -1.1240392
X2  -0.8213379
X3  -0.8289104
X4  -1.1421142
X5  -0.3703368
X6  -0.3644690
X7   0.1166906
X8  -0.2475214
X9  -0.7717700
X10  0.2998022

If your dataframe is big transposing twice needs a long time and a lot of memory. — skan, Aug 15 '17 at 18:50
Agree. I think it depends on expectations - I am using this approach on data.frames with several thousand rows and tens of columns and it works sufficiently fast. — Jakub Kužílek, Aug 16 '17 at 08:06

score 0 · Answer 7 · answered Nov 22 '22 at 12:46

How do you avoid having to specify every column name? Like this:

set.seed(124)
df <- data.frame(A=rnorm(10), B=rnorm(10), C=rnorm(10))

library(dplyr)

df %>%
  rowwise() %>%
  mutate(Mean = mean(
    eval(
      # The snippet can also be wrapped within a function
      parse(text = sprintf("c(%s)", paste(names(.), collapse = ",")))
    )
  ),
  ArgMin = which.min(
    eval(
      parse(text = sprintf("c(%s)", paste(names(.), collapse = ",")))
    )
  ))

or

getColnamesExpr <- function(df_names) parse(text = sprintf("c(%s)", paste(df_names, collapse = ",")))

df %>%
  rowwise() %>%
  mutate(
    Mean = mean(eval(getColnamesExpr(names(.)))),
    argmin = which.min(eval(getColnamesExpr(names(.))))
  )

R dplyr rowwise mean or min and other methods?

7 Answers7

Linked

Related