6

I have a data frame which contains several variables which got measured at different time points (e.g., test1_tp1, test1_tp2, test1_tp3, test2_tp1, test2_tp2,...).

I am now trying to use dplyr to add a new column to a data frame that calculates the row wise mean over a selection of these columns (e.g., mean over all time points for test1).

  1. I struggle even with the syntax for calculating the mean over explicitly named columns. What I tried without success was:

data %>% ... %>% mutate(test1_mean = mean(test1_tp1, test1_tp2, test1_tp3, na.rm = TRUE)

  1. I would further like to use regex/wildcards to select the column names, so something like

data %>% ... %>% mutate(test1_mean = mean(matches("test1_.*"), na.rm = TRUE)

Sven Hohenstein
  • 80,497
  • 17
  • 145
  • 168
user21932
  • 185
  • 1
  • 6
  • 1
    Use `rowwise()` as described in the answer to [this SO question](http://stackoverflow.com/questions/21818181/applying-a-function-to-every-row-of-a-table-using-dplyr). – eipi10 Jan 26 '15 at 21:43
  • Thank you all for your great answers! All of them are very useful... As I don't have enough reputation, I could not upvote your answers, but could only select one as the solution, sorry! – user21932 Jan 27 '15 at 00:34
  • 1
    I would strongly recommend learning about [tidy data](http://vita.had.co.nz/papers/tidy-data.html). If your data is a in tidy format, you'll find this sort of operation __much__ easier to do – hadley Jan 27 '15 at 01:04

3 Answers3

13

You can use starts_with inside select to find all columns starting with a certain string.

data %>%
  mutate(test1 = select(., starts_with("test1_")) %>%
           rowMeans(na.rm = TRUE))
Sven Hohenstein
  • 80,497
  • 17
  • 145
  • 168
  • Can you show how to rewrite this using `rowwise` and `mean` instead of `rowMeans`? – beginneR Dec 12 '17 at 12:39
  • I tried it but return 'Error in rowMeans(., na.rm = TRUE) : 'x' must be numeric'. Maybe using `data %>% mutate(., test_mean=rowMean(select(.,starts_with('test1_'))` is easy to understand. – Cobin Apr 03 '19 at 06:28
3

Here's how you could do it in dplyr - I use the iris data as an example:

iris %>% mutate(sum.Sepal = rowSums(.[grep("^Sepal", names(.))]))

This computes rowwise sums of all columns that start with "Sepal". You can use rowMeans instead of rowSums the same way.

talat
  • 68,970
  • 21
  • 126
  • 157
0

Not a dplyr solution, but you can try:

cols_2sum <- grepl('test1',colnames(data))
rowMeans(data[,cols_2sum])
Marat Talipov
  • 13,064
  • 5
  • 34
  • 53