add column with row wise mean over selected columns using dplyr

Question

I have a data frame which contains several variables which got measured at different time points (e.g., test1_tp1, test1_tp2, test1_tp3, test2_tp1, test2_tp2,...).

I am now trying to use dplyr to add a new column to a data frame that calculates the row wise mean over a selection of these columns (e.g., mean over all time points for test1).

I struggle even with the syntax for calculating the mean over explicitly named columns. What I tried without success was:

data %>% ... %>% mutate(test1_mean = mean(test1_tp1, test1_tp2, test1_tp3, na.rm = TRUE)

I would further like to use regex/wildcards to select the column names, so something like

data %>% ... %>% mutate(test1_mean = mean(matches("test1_.*"), na.rm = TRUE)

Use `rowwise()` as described in the answer to [this SO question](http://stackoverflow.com/questions/21818181/applying-a-function-to-every-row-of-a-table-using-dplyr). — eipi10, Jan 26 '15 at 21:43
Thank you all for your great answers! All of them are very useful... As I don't have enough reputation, I could not upvote your answers, but could only select one as the solution, sorry! — user21932, Jan 27 '15 at 00:34
I would strongly recommend learning about [tidy data](http://vita.had.co.nz/papers/tidy-data.html). If your data is a in tidy format, you'll find this sort of operation __much__ easier to do — hadley, Jan 27 '15 at 01:04

score 13 · Accepted Answer · answered Jan 26 '15 at 22:07

13

You can use starts_with inside select to find all columns starting with a certain string.

data %>%
  mutate(test1 = select(., starts_with("test1_")) %>%
           rowMeans(na.rm = TRUE))

answered Jan 26 '15 at 22:07

Sven Hohenstein

80,497
17
145
168

Can you show how to rewrite this using `rowwise` and `mean` instead of `rowMeans`? – beginneR Dec 12 '17 at 12:39
I tried it but return 'Error in rowMeans(., na.rm = TRUE) : 'x' must be numeric'. Maybe using `data %>% mutate(., test_mean=rowMean(select(.,starts_with('test1_'))` is easy to understand. – Cobin Apr 03 '19 at 06:28

talat · Answer 2 · 2015-01-26T22:25:01.577

3

Here's how you could do it in dplyr - I use the iris data as an example:

iris %>% mutate(sum.Sepal = rowSums(.[grep("^Sepal", names(.))]))

This computes rowwise sums of all columns that start with "Sepal". You can use rowMeans instead of rowSums the same way.

edited Jan 26 '15 at 22:25

answered Jan 26 '15 at 21:43

talat

68,970
21
126
157

score 0 · Answer 3 · answered Jan 26 '15 at 21:42

0

Not a dplyr solution, but you can try:

cols_2sum <- grepl('test1',colnames(data))
rowMeans(data[,cols_2sum])

answered Jan 26 '15 at 21:42

Marat Talipov

13,064
5
34
53

add column with row wise mean over selected columns using dplyr

3 Answers3

Linked