Calculate row means on subset of columns

Question

Given a sample data frame:

C1<-c(3,2,4,4,5)
C2<-c(3,7,3,4,5)
C3<-c(5,4,3,6,3)
DF<-data.frame(ID=c("A","B","C","D","E"),C1=C1,C2=C2,C3=C3)

DF
    ID C1 C2 C3
  1  A  3  3  5
  2  B  2  7  4
  3  C  4  3  3
  4  D  4  4  6
  5  E  5  5  3

What is the best way to create a second data frame that would contain the ID column and the mean of each row? Something like this:

ID  Mean
A    3.66
B    4.33
C    3.33
D    4.66
E    4.33

Something similar to:

RM<-rowMeans(DF[,2:4])

I'd like to keep the means aligned with their ID's.

score 65 · Accepted Answer · edited Mar 30 '17 at 00:19

65

Calculate row means on a subset of columns:

Create a new data.frame which specifies the first column from DF as an column called ID and calculates the mean of all the other fields on that row, and puts that into column entitled 'Means':

data.frame(ID=DF[,1], Means=rowMeans(DF[,-1]))
  ID    Means
1  A 3.666667
2  B 4.333333
3  C 3.333333
4  D 4.666667
5  E 4.333333

edited Mar 30 '17 at 00:19

Eric Leschinski

146,994
96
417
335

answered Jun 08 '12 at 12:06

Jilber Urbina

58,147
10
114
138

and Eric Leschinski: if we want to measure standard deviation or some 90th percentile, can use rowSDs like that? – Kumar Nov 21 '22 at 07:06

BenBarnes · Answer 2 · 2015-07-26T09:27:04.527

31

Starting with your data frame DF, you could use the data.table package:

library(data.table)

## EDIT: As suggested by @MichaelChirico, setDT converts a
## data.frame to a data.table by reference and is preferred
## if you don't mind losing the data.frame
setDT(DF)

# EDIT: To get the column name 'Mean':

DF[, .(Mean = rowMeans(.SD)), by = ID]

#      ID     Mean
# [1,]  A 3.666667
# [2,]  B 4.333333
# [3,]  C 3.333333
# [4,]  D 4.666667
# [5,]  E 4.333333

edited Jul 26 '15 at 09:27

answered Jun 08 '12 at 09:03

BenBarnes

19,114
6
56
74

1

Thanks. Also note from `class(DF)` that you don't _lose_ the `data.frame`, in the sense that any function looking for a `data.frame` object should accept `DF` after `setDT` (especially now that `data.table` is on the mature side) – MichaelChirico Jul 26 '15 at 14:40
What if I want instead the row mean between C2 and C3 only? – user3841581 Feb 23 '16 at 22:44
8

Then you can use `DF[, .(Mean = rowMeans(.SD)), by = ID, .SDcols = c("C2", "C3")]`. The argument `.SDcols` determines which columns you want to include in `.SD`. @user3841581 – BenBarnes Feb 24 '16 at 12:38
@BenBarnes In my case I am not sure about the actual number of columns I want to take rowMeans, they could be 196 in some cases while 198 in other and so on. But one thin which is common is the initial of their name which are like Mgw.1, Mgw.2 ... Mgw.196 similarly Hel.1, Hel.2 ... Hel.198 So what I want to do is donont touch intial 5 columns of the data.table then all those which has initials Mgw, take their rowMeans and assign it to MGW (delete all individual column, just keep one with mean value) and so on for the rest of the columns. Can you guide me how can I do that? – Newbie Jul 27 '16 at 15:39
@Newbie that sounds like a new question, which you should post on its own. – BenBarnes Jul 27 '16 at 15:48
@BenBarnes Can you kindly help me with [this](http://stackoverflow.com/questions/38618110/calculate-rowmeans-on-a-range-of-column-variable-number) question – Newbie Jul 27 '16 at 16:05
@BenBarnes Hi, I am going to find the mean of every 10 columns of my data (which has 1000 columns and some NA data) how should I do it?Can you please guide me?Thanks :) – Shalen May 07 '20 at 17:18
@Shalen, I'd recommend asking a new question. It's a bit too involved for a comment and not entirely clear what you expect as output: `rowMeans` or the grand mean. Plus it doesn't look like you have an `ID` column. – BenBarnes Jun 02 '20 at 10:34

score 24 · Answer 3 · edited Dec 02 '16 at 20:52

24

You can create a new row with $ in your data frame corresponding to the Means

DF$Mean <- rowMeans(DF[,2:4])

edited Dec 02 '16 at 20:52

Jilber Urbina

58,147
10
114
138

answered Apr 16 '16 at 03:08

Nadegelia

241
2
2

zx8754 · Answer 4 · 2022-10-24T07:29:06.507

13

Using dplyr:

library(dplyr)

DF %>%
  transmute(ID,
            Mean = rowMeans(across(C1:C3)))

Or

DF %>%
  transmute(ID,
            Mean = rowMeans(select(., C1:C3)))

#   ID     Mean
# 1  A 3.666667
# 2  B 4.333333
# 3  C 3.333333
# 4  D 4.666667
# 5  E 4.333333

edited Oct 24 '22 at 07:29

answered May 15 '18 at 14:59

zx8754

52,746
12
114
209

1

I would update this to use `across` instead of `select`. `across` still allows for tidy select helpers. The output of `across` is a data frame so `rowMeans` will still work on it and you do not need the dot notation. – LMc Oct 21 '22 at 22:34

score 2 · Answer 5 · answered Oct 12 '20 at 19:42

rowMeans is nice, but if you are still trying to wrap your head around the apply family of functions, this is a good opprotunity to begin understanding it.

DF['Mean'] <- apply(DF[,2:4], 1, mean)

Notice I'm doing a slightly different assignment than the first example. This approach makes it easier to incorporate it into for loops.

score 0 · Answer 6 · answered Oct 24 '22 at 07:35

Awnser adapted from: here for N different groups of columns

library(dplyr, warn.conflicts = FALSE)
library(purrr)
row_means <- DF %>% 
        dplyr::select(where(is.numeric)) %>% 
        split.default(stringr::str_remove(names(df), '[0-9]')) %>% 
        map(rowMeans) %>% 
        setNames(paste0("mean_", names(.)))
    DF %>% 
        mutate(
            !!!row_means
        )

score 0 · Answer 7 · answered Aug 21 '23 at 10:42

rowwise() in dplyr can be used in such situations

library(dplyr)
#> 

DF %>% 
  rowwise() %>% 
  summarise(ID,
            Mean = mean(c_across(C1:C3))) 
#> # A tibble: 5 × 2
#>   ID     Mean
#>   <chr> <dbl>
#> 1 A      3.67
#> 2 B      4.33
#> 3 C      3.33
#> 4 D      4.67
#> 5 E      4.33

Still, if you want to use rowMeans that can also be used in piped syntax

DF %>% 
  mutate(Mean = rowMeans(.[-1]))

#>   ID C1 C2 C3     Mean
#> 1  A  3  3  5 3.666667
#> 2  B  2  7  4 4.333333
#> 3  C  4  3  3 3.333333
#> 4  D  4  4  6 4.666667
#> 5  E  5  5  3 4.333333

. is actually a special argument which passes the result of previous piped syntax to next pipe operation.

score -1 · Answer 8 · answered Sep 15 '19 at 09:10

(Another solution using pivot_longer & pivot_wider from latest Tidyr update)

You should try using pivot_longer to get your data from wide to long form Read latest tidyR update on pivot_longer & pivot_wider (https://tidyr.tidyverse.org/articles/pivot.html)

library(tidyverse)
C1<-c(3,2,4,4,5)
C2<-c(3,7,3,4,5)
C3<-c(5,4,3,6,3)
DF<-data.frame(ID=c("A","B","C","D","E"),C1=C1,C2=C2,C3=C3)

Output here

  ID     mean
  <fct> <dbl>
1 A      3.67
2 B      4.33
3 C      3.33
4 D      4.67
5 E      4.33

Calculate row means on subset of columns

8 Answers8

Linked

Related