1

I am trying take the mean of a list of columns in R and am running into a issue. Let's say I have:

A B  C  D
1 2  3  4
5 6  7  8
9 10 11 12

What I am trying to do is take the mean of columns c(A,C) and save it as a value say (E) as well as the mean of columns c(B,D) and have it save as a different value say F. Is that possible?

E   F
2   3
6   7
10  11
Dante Smith
  • 561
  • 1
  • 6
  • 21

2 Answers2

3

Check out dplyr:

library(dplyr)
df <- df %>% mutate(E=(A+C)/2, F=(B+D)/2)
df

  A  B  C  D  E  F
1 1  2  3  4  2  3
2 5  6  7  8  6  7
3 9 10 11 12 10 11
thc
  • 9,527
  • 1
  • 24
  • 39
2

We can subset the dataset with columns 1 & 2, another one with 3 & 4, add them together, divide by 2, and change the column names with setNames

setNames((df1[1:2] + df1[3:4])/2, c("E", "F"))
#   E  F
#1  2  3
#2  6  7
#3 10 11

Or another option is rowMeans by keeping it in a list using the recycling logical vector, loop through the list (using sapply) and get the rowMeans

i1 <- c(TRUE, FALSE)
sapply(list(df1[i1], df1[!i1]), rowMeans)

Or another option is unlist the dataset, convert it to array and use apply to get the mean

apply(array(unlist(df1), c(3, 2, 2)), c(1,2), mean)
akrun
  • 874,273
  • 37
  • 540
  • 662
  • If I have a list of columns, say a1,a2,a3,b1,b2,b3,c1,c2,c3 and wanted to take the mean score of columns a1,a2,a3 into a column and likewise with b and c to have a mean of a,b,c into a single column for each letter, could I do the same? – Dante Smith Jan 31 '17 at 16:36
  • 1
    @DanteSmith In that case do `sapply(split.default(df1, sub("\\d+", "", colnames(df1))), rowMeans)` – akrun Jan 31 '17 at 16:38
  • Where does the \\d+ come into play? Is that splitting the column by number? – Dante Smith Jan 31 '17 at 16:40
  • 1
    @DanteSmith It is based on the column names you showed. I see that you have numbers that follow a, b, c etc. We are removing that with `sub` and splitting the dataset columns with prefix 'a', 'b', 'c' – akrun Jan 31 '17 at 16:41
  • Thanks for the quick responses. So if the names of the columns were say, alow, amedium,ahigh and blow,bmedium, bhigh as opposed to numerics could you have it take the mean of every 3 columns to produce a= mean(alow,amedium,ahigh) ? – Dante Smith Jan 31 '17 at 16:46
  • @DanteSmith Sorry, didn't see your comment. In that case, I would use `sapply(split.default(df1, substr(colnames(df1), 1, 1)), rowMeans)` – akrun Jan 31 '17 at 17:01
  • @DanteSmith BTW, what kind of patterns you have in the original dataset. If we keep on changing the patterns, it will take a lot of time to solve – akrun Jan 31 '17 at 17:02
  • @akrun Hi, I am going to do something like this question but instead, I want to find the mean of every 10 columns of my data (which has 1000 columns and some NA data) how should I do it?Can you please guide me?Thanks :) – Shalen May 07 '20 at 17:58
  • @akrunsure, thanks ... I thought maybe it is too basic ;) – Shalen May 07 '20 at 18:45
  • 1
    @akrun I did, didn't I?! – Shalen May 07 '20 at 18:57