39

I would like to get the average for certain columns for each row.

I have this data:

w=c(5,6,7,8)
x=c(1,2,3,4)
y=c(1,2,3)
length(y)=4
z=data.frame(w,x,y)

Which returns:

  w x  y
1 5 1  1
2 6 2  2
3 7 3  3
4 8 4 NA

I would like to get the mean for certain columns, not all of them. My problem is that there are a lot of NAs in my data. So if I wanted the mean of x and y, this is what I would like to get back:

  w x  y mean
1 5 1  1    1
2 6 2  2    2
3 7 3  3    3
4 8 4 NA    4

I guess I could do something like z$mean=(z$x+z$y)/2 but the last row for y is NA so obviously I do not want the NA to be calculated and I should not be dividing by two. I tried cumsum but that returns NAs when there is a single NA in that row. I guess I am looking for something that will add the selected columns, ignore the NAs, get the number of selected columns that do not have NAs and divide by that number. I tried ??mean and ??average and am completely stumped.

ETA: Is there also a way I can add a weight to a specific column?

eli-k
  • 10,898
  • 11
  • 40
  • 44
thequerist
  • 1,774
  • 3
  • 19
  • 27

3 Answers3

60

Here are some examples:

> z$mean <- rowMeans(subset(z, select = c(x, y)), na.rm = TRUE)
> z
  w x  y mean
1 5 1  1    1
2 6 2  2    2
3 7 3  3    3
4 8 4 NA    4

weighted mean

> z$y <- rev(z$y)
> z
  w x  y mean
1 5 1 NA    1
2 6 2  3    2
3 7 3  2    3
4 8 4  1    4
> 
> weight <- c(1, 2) # x * 1/3 + y * 2/3
> z$wmean <- apply(subset(z, select = c(x, y)), 1, function(d) weighted.mean(d, weight, na.rm = TRUE))
> z
  w x  y mean    wmean
1 5 1 NA    1 1.000000
2 6 2  3    2 2.666667
3 7 3  2    3 2.333333
4 8 4  1    4 2.000000
eli-k
  • 10,898
  • 11
  • 40
  • 44
kohske
  • 65,572
  • 8
  • 165
  • 155
  • 1
    Thanks, this does exactly what I am looking for. I really need to study up on this magical apply command, it seems like it is a solution to everything. – thequerist Feb 28 '12 at 22:30
28

Try using rowMeans:

z$mean=rowMeans(z[,c("x", "y")], na.rm=TRUE)

  w x  y mean
1 5 1  1    1
2 6 2  2    2
3 7 3  3    3
4 8 4 NA    4
Andrew
  • 36,541
  • 13
  • 67
  • 93
  • +1 Thanks, I normally use Extract, cannot believe I did not think of this. Gave the check to kohske for including solution to weighted also. – thequerist Feb 28 '12 at 22:33
  • @andrew How can I get rowMeans of a range of column with some particular name (e.g. MGW.1, MGW.2, MGW.3 ... MGW.198) so all of these columns have name starting with MGW but then number is different and I am not sure about exact number of such columns they could be 196 in one case and 198 in other. What I want is something like: `data.frame(ID=DF[,1:4], MGW=rowMeans(DF[,MGW.*]), HEL=rowMeans(DF[,HEL.*]))` It means the first 4 columns should not be touched while for rest of the columns take average of all columns in category MGW.* and HEL.* – Newbie Jul 27 '16 at 14:42
  • For that, I'd recommend using dplyr and tidyr, since dplyr lets you select columns based on regular expressions. See [this question](https://stackoverflow.com/questions/33401788/dplyr-using-mutate-like-rowmeans) for an example, or [this gist](https://gist.github.com/andrewheiss/dcd38268c65396e1f18de626e70cae47) for a worked example of this question. – Andrew Jul 28 '16 at 18:24
  • @Andrew Hi, I am going to find the mean of every 10 columns of my data (which has 1000 columns) how should I modify it?Can you please guide me?Thanks :) – Shalen May 07 '20 at 16:15
5

Here is a tidyverse solution using c_across which is designed for row-wise aggregations. This makes it easy to refer to columns by name, type or position and to apply any function to the selected columns.

library("tidyverse")

w <- c(5, 6, 7, 8)
x <- c(1, 2, 3, 4)
y <- c(1, 2, 3, NA)
z <- data.frame(w, x, y)

z %>%
  rowwise() %>%
  mutate(
    mean = mean(c_across(c(x, y)), na.rm = TRUE),
    max = max(c_across(x:y), na.rm = TRUE)
  )
#> # A tibble: 4 × 5
#> # Rowwise: 
#>       w     x     y  mean   max
#>   <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     5     1     1     1     1
#> 2     6     2     2     2     2
#> 3     7     3     3     3     3
#> 4     8     4    NA     4     4

Created on 2022-06-25 by the reprex package (v2.0.1)

dipetkov
  • 3,380
  • 1
  • 11
  • 19