2

Im sure this is simple but Im new to R (like 30 minutes new) and still scratching my head.

I have five columns. I would like to sort the data by a composite of column a and b (a is chromosome, b is locus) and then average the values over columns sample1,sample2 and sample3 to provide a text output.

So far I have the following but I think my means calculation is letting me down

#Import the data as a data frame
df = read.table("mydata.txt")

#Make sure its there
summary(df)

#Make sure data is sorted by chromosome and by locus.

df = df[order(df[[1]], df[[2]]), ]
#Take the control samples and average each row for three columns excluding the first two     columns- add the per row means to the data frame

dfmns <- rowMeans( df[ , c("sample1", "sample2", "sample3")] ) 

The sample data is as follows:

chr leftPos strand  JWA JWB JWC JWD OE33_F
chr1    100202137   +   2   0   1   0   0
chr1    100260304   -   141 62  75  55  20
chr1    100724039   -   0   1   0   0   0

I would like

chr      leftPos    strand  JWA JWB JWC JWD OE33_F   Means
chr1    100202137   +          2    0   1   0   0     0.6
chr1    100260304   -        141    62  75  55  20    70.6
chr1    100724039   -          0    1   0   0   0     0.2

I think the code falls over at the order function as perhaps Im not referencing the columns properly?

Roland
  • 127,288
  • 10
  • 191
  • 288
Sebastian Zeki
  • 6,690
  • 11
  • 60
  • 125
  • What makes you think the calculation is wrong? We don't have the input data so we have no idea what those code is retuning or what you expect. – MrFlick Aug 05 '14 at 05:06
  • To get better help from us (as @MrFlick started to say), please read about making a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), including giving us a representative dataset and output that indicates what you want and what you are instead getting. – r2evans Aug 05 '14 at 05:13
  • chr leftPos strand JWA JWB JWC JWD OE33_F chr1 100202137 + 2 0 1 0 0 chr1 100260304 - 141 62 75 55 20 chr1 100724039 - 0 1 0 0 0 – Sebastian Zeki Aug 05 '14 at 05:35
  • 2
    The first 3 samples are JWA JWB JWC yes? How do you get a mean of 2 from c(2,0,1)? – JeremyS Aug 05 '14 at 05:47
  • @user3632206. Your example and code you posted are different. It is not clear which of the columns are "sample1:sample3". I tried with a combination of the columns, but was not able to get the "Means" output. – akrun Aug 05 '14 at 06:16
  • JeremyS- Apologies I had just put data in to show the format- the means were incorrect but I have now corrected them – Sebastian Zeki Aug 05 '14 at 06:42
  • akrun- Yes I think Im not sure how to refer to the columns. The columns I want to get a mean of are JWA JWB JWC JWD and OE33_F – Sebastian Zeki Aug 05 '14 at 06:43

1 Answers1

1

It may be possible that you have used some of the character columns also into calculating rowMeans. In your example, if you wanted to remove the character columns/not-to-be-selected columns (here they are in positions 1,2, and 3)

df$Means <- rowMeans(df[,-(1:3)]) #1:3 refers to the columns `chr` to `strand`
 df
#   chr   leftPos strand JWA JWB JWC JWD OE33_F Means
#1 chr1 100202137      +   2   0   1   0      0   0.6
#2 chr1 100260304      - 141  62  75  55     20  70.6
#3 chr1 100724039      -   0   1   0   0      0   0.2

If you have only limited number of columns to perform the mean:

rowMeans(df[,c("JWA", "JWB", "JWC","JWD", "OE33_F")])
#[1]  0.6 70.6  0.2

Or

rowMeans(df[grep("^JW|^OE", colnames(df))])
#[1]  0.6 70.6  0.2
akrun
  • 874,273
  • 37
  • 540
  • 662