2

I've found an excellent solution to a problem I'm having where I want to create a new column that computes the mean of all the cells in the corresponding row here:

https://stackoverflow.com/a/33438918/12744116

The data is admittedly not tidy, but the solution, which I've copied below, gets the job done:

data %>% 
    rowwise() %>% 
    mutate(c=mean(c(a,b)))

#      id     a     b     c
#   (dbl) (dbl) (dbl) (dbl)
# 1   101     1     2   1.5
# 2   102     2     2   2.0
# 3   103     3     2   2.5

However, unlike this simpler example, I have far too many columns to name. I'm wondering if there's any way of quickly referring to the columns using slicing notation (i.e., instead of c(a, b), something like 2:3) or some other way of referring to the columns via their index.

I've found something similar on another Stack Overflow thread here, but the solution has its own problems since we're listing all the column indices instead of the column names. I have way too many columns for me to list them all for each calculation.

Any solutions?

EDIT: I figured one out myself, but I feel like it's too inelegant and I believe I'm maybe extracting the entire column for every row, which is obviously going to be a slower solution than expected:

data %>%
  mutate(id = row_number()) %>%
  rowwise() %>%
  mutate(avg = mean(c(.[id, 2:4], recursive=TRUE)))

Any solutions that are faster?

anonymous1a
  • 785
  • 1
  • 7
  • 12
  • 3
    Why not `rowMeans(data[-1])` ? – Ronak Shah Jan 29 '20 at 06:21
  • @RonakShah you just beat me to it. If its a simple average across all columns but `id`, then his solution can be used as-is. Else you can pass vector of column numbers. – Drj Jan 29 '20 at 06:23
  • @chinsoon12 I guess, sure, that'd be nice for general knowledge, but I would prefer a way of doing it that's within the tidyverse. I could, of course, compute the answer using just rowmeans loops, for example. My larger question is not so much about getting the row means themselves — that's just the particular example that led me to this larger question — but rather about an efficient way of indexing columns with the rowwise() function in tidyverse. – anonymous1a Jan 29 '20 at 06:52
  • 1
    I don't know that I would use "efficient" when dealing with `rowwise` on data; it works, sure, but while some calculations certainly require it, in general you are defeating R's efficiency calculating on whole vectors at a time. I understand that you are using `rowMeans` as a method for understanding how to deal with data in this fashion, but ... perhaps your *real* problem might also not need `rowwise`? – r2evans Jan 29 '20 at 07:53

2 Answers2

3

You can do:

df %>%
 mutate(c = rowMeans(select(., 2:3)))

   id a b   c
1 101 1 2 1.5
2 102 2 2 2.0
3 103 3 2 2.5

Or:

df %>%
 mutate(c = rowMeans(select(., 2:length(.))))
tmfmnk
  • 38,881
  • 4
  • 47
  • 67
1

For me using rowMeans seems straightforward without involving tidyverse functions.

data$c <- rowMeans(data[2:3])

however, if you prefer tidyverse solution we can take a bit of help from purrr map functions.

library(dplyr)
library(purrr)

For only two columns

data %>% mutate(c = map2_dbl(a, b, ~mean(c(.x, .y))))

For many columns

data %>%  mutate(c = pmap_dbl(select(., a:b), ~mean(c(...))))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213