0

I have a data frame and I'm trying to calculate median for each group separately. When I separate the data frame in two groups and calculate the median for each one, I am getting an NA result.

The data is :

    x1  x2  x3  x4  x5  x6  x7  y1  y2  y3  y4  y5  y6  y7  y8
9.488404158 9.470895414 9.282433728 9.366707445 9.955383045 9.640816474   9.606262272   9.329651027 9.434541611 9.473922432 9.311412966 9.3154885   9.434977488 9.470895414 9.764258059
8.630629966 8.55831075  8.788391003 8.576231135 8.671587906 8.842979993 8.861958856 8.58330436  8.603596508 8.570129609 8.59798922  8.572686772 8.679751791 8.663950953 8.432875347
9.354748885 9.367668838 9.259952558 9.421538213 9.554635162 9.603744578 9.452197983 9.284228877 9.404607878 9.317737979 9.343115301 9.310644266 9.27227486  9.360337823 9.44706281
9.944863964 9.950427516 10.19101759 10.07350804 10.03269879 10.1307908  10.03487287 9.74609383  9.886379007 9.775472567 10.036596   9.544738458 9.699611598 9.911962567 9.625804277

Code:

  rowN <- nrow(AT1)
  MD1<-vector(length=rowN)
  MD2<-vector(length=rowN)

   MD1[1:rowN]<-NA
   MD2[1:rowN]<-NA


 x<- AT1[,c(2,3,4,5,6,7,8) ]
  write.csv(x,"x.csv",row.names=TRUE)
  x<-as.matrix(x)
  for(i in 2:rowN) { 
  MD1[i]=median(x[i,])
  }
 write.csv(MD1,"MD1.csv",row.names=TRUE)

  y<- AT1[,c(9,10,11,12,13,14,15,16)]
  write.csv(y,"y.csv",row.names=TRUE)
  y<-as.matrix(y)
  for(j in 2:rowN) {
  MD2[j]=median(y[j,])
  }
  write.csv(MD2,"MD2.csv",row.names=TRUE)
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • Please show a reproducible example. We can use `aggregate/dplyr/data.table` methods. – akrun Aug 31 '15 at 11:08
  • @akrun why the looping does not work , and R produce a extra column with index when i write csv file , and if i use median(x[,2:7]) , i will have error too :( – shawin karim Aug 31 '15 at 11:15
  • You were using `write.csv` with `row.names=TRUE`. Use `row.names=FALSE` if you don't need that extra column. – akrun Aug 31 '15 at 11:25
  • I didn't find you creating the `MD1` and `MD2` object in the code. I would do `MD1 <- numeric(nrow(x-1))` and same for `MD2` before the `for` loop step. – akrun Aug 31 '15 at 11:26
  • I have created MD1, MD2: MD1<-vector(length=rowN) MD2<-vector(length=rowN) MD1[1:rowN]<-NA MD2[1:rowN]<-NA – shawin karim Aug 31 '15 at 11:52
  • I showed some methods with a reproducible example. – akrun Aug 31 '15 at 11:57
  • your data as shown above only have 15 columns, but you're trying to select rows 9-16. You should be getting an "undefined column" error. Can you show the results of `str(AT1)` ? – Ben Bolker Aug 31 '15 at 18:06
  • I omit the first column , because its the ID 's – shawin karim Aug 31 '15 at 18:08
  • MD1 :x NA 8.671588 9.421538 10.034873 10.387686 9.346154 7.936674 11.431484 10.984427 , R produce x, NA, then median value for x group ? – shawin karim Aug 31 '15 at 18:13

1 Answers1

3

It would have been better to show a reproducible example. Based on the loop code, it seems to me that the OP want to get the median of each row. Assuming that the median is calculated for columns 2:8 and for 9:16 separately, we convert the 'data.frame' to 'matrix' (as.matrix) and use rowMedians from library(matrixStats).

x1 <- as.matrix(AT1[2:8 ])
x2 <- as.matrix(AT1[9:16])

library(matrixStats)
rowMedians(x1, na.rm=TRUE)
#[1] -0.09411013 -0.08554095  0.11953107 -0.26869311  0.33224445

rowMedians(x2, na.rm=TRUE)
#[1]  0.10557881 -0.74135403 -0.05876725  0.69230776 -0.21402339

data

set.seed(24)
m1 <- matrix(rnorm(5*15), ncol=15)
AT1 <- data.frame(col1= LETTERS[1:5], m1)
akrun
  • 874,273
  • 37
  • 540
  • 662
  • the error produce in the second group (y ) and its : There were 50 or more warnings (use warnings() to see the first 50) – shawin karim Aug 31 '15 at 11:31
  • @shawinkarim Without a reproducible example, I can't comment. If I use some standard example, my code should work. – akrun Aug 31 '15 at 11:32
  • iam typing your name akrun but its disappear at the begging ? – shawin karim Aug 31 '15 at 11:47
  • @shawinkarim that's because the author of a post is always notified, so starting the message with '@ their_name' is redundant when there's no other comment author :) Just the way SO works, but I agree it could be disturbing at first (Will delete this when I'll see a +1 on the comment meaning it's been read :p) – Tensibai Aug 31 '15 at 12:46
  • 2
    @shawinkarim, if you want more detailed answers, please include a reproducible example (see [here](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)) that clearly shows what is going wrong. The answer of @akrun nicely shows that you do not need a `for` loop to calculate the median per row of a matrix, and should also work on your data. – Paul Hiemstra Aug 31 '15 at 14:07