How to average a row over three columns

Question

Im sure this is simple but Im new to R (like 30 minutes new) and still scratching my head.

I have five columns. I would like to sort the data by a composite of column a and b (a is chromosome, b is locus) and then average the values over columns sample1,sample2 and sample3 to provide a text output.

So far I have the following but I think my means calculation is letting me down

#Import the data as a data frame
df = read.table("mydata.txt")

#Make sure its there
summary(df)

#Make sure data is sorted by chromosome and by locus.

df = df[order(df[[1]], df[[2]]), ]
#Take the control samples and average each row for three columns excluding the first two     columns- add the per row means to the data frame

dfmns <- rowMeans( df[ , c("sample1", "sample2", "sample3")] )

The sample data is as follows:

chr leftPos strand  JWA JWB JWC JWD OE33_F
chr1    100202137   +   2   0   1   0   0
chr1    100260304   -   141 62  75  55  20
chr1    100724039   -   0   1   0   0   0

I would like

chr      leftPos    strand  JWA JWB JWC JWD OE33_F   Means
chr1    100202137   +          2    0   1   0   0     0.6
chr1    100260304   -        141    62  75  55  20    70.6
chr1    100724039   -          0    1   0   0   0     0.2

I think the code falls over at the order function as perhaps Im not referencing the columns properly?

What makes you think the calculation is wrong? We don't have the input data so we have no idea what those code is retuning or what you expect. — MrFlick, Aug 05 '14 at 05:06
To get better help from us (as @MrFlick started to say), please read about making a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), including giving us a representative dataset and output that indicates what you want and what you are instead getting. — r2evans, Aug 05 '14 at 05:13
chr leftPos strand JWA JWB JWC JWD OE33_F chr1 100202137 + 2 0 1 0 0 chr1 100260304 - 141 62 75 55 20 chr1 100724039 - 0 1 0 0 0 — Sebastian Zeki, Aug 05 '14 at 05:35
The first 3 samples are JWA JWB JWC yes? How do you get a mean of 2 from c(2,0,1)? — JeremyS, Aug 05 '14 at 05:47
@user3632206. Your example and code you posted are different. It is not clear which of the columns are "sample1:sample3". I tried with a combination of the columns, but was not able to get the "Means" output. — akrun, Aug 05 '14 at 06:16
JeremyS- Apologies I had just put data in to show the format- the means were incorrect but I have now corrected them — Sebastian Zeki, Aug 05 '14 at 06:42
akrun- Yes I think Im not sure how to refer to the columns. The columns I want to get a mean of are JWA JWB JWC JWD and OE33_F — Sebastian Zeki, Aug 05 '14 at 06:43

akrun · Accepted Answer · 2014-08-05T07:30:22.673

1

It may be possible that you have used some of the character columns also into calculating rowMeans. In your example, if you wanted to remove the character columns/not-to-be-selected columns (here they are in positions 1,2, and 3)

df$Means <- rowMeans(df[,-(1:3)]) #1:3 refers to the columns `chr` to `strand`
 df
#   chr   leftPos strand JWA JWB JWC JWD OE33_F Means
#1 chr1 100202137      +   2   0   1   0      0   0.6
#2 chr1 100260304      - 141  62  75  55     20  70.6
#3 chr1 100724039      -   0   1   0   0      0   0.2

If you have only limited number of columns to perform the mean:

rowMeans(df[,c("JWA", "JWB", "JWC","JWD", "OE33_F")])
#[1]  0.6 70.6  0.2

Or

rowMeans(df[grep("^JW|^OE", colnames(df))])
#[1]  0.6 70.6  0.2

edited Aug 05 '14 at 07:30

answered Aug 05 '14 at 06:54

akrun

874,273
37
540
662

Excellent. Exactly what I wanted. Thank you – Sebastian Zeki Aug 05 '14 at 07:04
What if I want to specify which columns to perform the mean on? – Sebastian Zeki Aug 05 '14 at 07:07
@user3632206. How many columns do you want to perform the mean in your actual dataset? – akrun Aug 05 '14 at 07:10
OK I think I figured it out, I just specify the column index that I want to run the means on eg to get the mean of column 4 and 5 it would be df$Means <- rowMeans(df[,(4:5)]). Not sure what the syntax would be if I want to get the mean of columns 4 and 6 only though – Sebastian Zeki Aug 05 '14 at 07:12
rowMeans(df[,c(4,6)]) – akrun Aug 05 '14 at 07:13

How to average a row over three columns

1 Answers1