R ignore missing data

Question

I have two R data files each with 100 columns but row number vary from 220 to 360 in each data1 and data2. data1 and data2 represent changes of two quantities changes during a set of experiments. so [i,j] of data1 and[i,j] of data2 represent same event, but will have different value. I want to print data which is greater than 2.5 in any of the file, along with the column and row number

for (i in 1:360){
  for (j in 1:100){
  if((data1[i,j]>2.5) | ( data2[i,j]>2.5)) {
    cat(i, j, data1[i,j],  data2[i,j], "\n", file="extr-b2.5.txt", append=T)
  }
 }
}

I get this error because of NAs.

Error in if ((data1[i, j] > 2.5) | (data2[i, j] >  : 
  missing value where TRUE/FALSE needed

if I set i to 1:220 (every column has at least 220 row), it works fine.

How can modify above code to neglect NA values.

First thing to do will be probably ditch the `for` loops. Second will be [a reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) — David Arenburg, Jun 17 '14 at 18:20
maybe you should use `?which` for this an use the indices to go back and pull the data — rawr, Jun 17 '14 at 18:22

agstudy · Accepted Answer · 2018-08-25T15:28:26.717

I would something like this :

idx <- which(dat1>2.5 & dat2>2.5,arr.ind=TRUE)
cbind(idx,v1=dat1[idx],v2=dat2[idx])

reproducible example:

set.seed(1)
dat1 <- as.data.frame(matrix(runif(12,1,5),ncol=3))
dat2 <- as.data.frame(matrix(runif(12,1,5),ncol=3))
idx <- which(dat1>2.5 & dat2>2.5,arr.ind=TRUE)
cbind(idx,v1=dat1[idx],v2=dat2[idx])

#      row col       v1       v2
# [1,]   3   1 3.291413 4.079366
# [2,]   4   1 4.632831 2.990797
# [3,]   2   2 4.593559 4.967624
# [4,]   3   2 4.778701 2.520141
# [5,]   4   2 3.643191 4.109781
# [6,]   1   3 3.516456 4.738821

where dat1 and dat2:

# dat1
# V1       V2       V3
# 1 2.062035 1.806728 3.516456
# 2 2.488496 4.593559 1.247145
# 3 3.291413 4.778701 1.823898
# 4 4.632831 3.643191 1.706227
# > dat2
# V1       V2       V3
# 1 3.748091 3.870474 4.738821
# 2 2.536415 4.967624 1.848570
# 3 4.079366 2.520141 3.606695
# 4 2.990797 4.109781 1.502220

score 0 · Answer 2 · answered Jun 17 '14 at 18:24

Without the for loops you can use pmax to compare two arrays.

 bigger=pmax(data1,data2)

this gives an array with the maximum values. Then you can check if the max is bigger than 2.5

 which( bigger>2.5,arr.ind=T)

will give the location where the max is bigger than your cutoff.

for completeness if I were to do it in your double looping framework, I would just set the Missing values to be below the min of all the other data, this will work so long as you have a value below 2.5 somewhere in your data.

lowest=min(c(data1,data2))
data1[which(is.na(data1),arr.ind=T)]=lowest

then run your double loop

R ignore missing data

2 Answers2

reproducible example: