Matching one value against all values in a dataframe, in iteration based on criteria

Question

This is a small section of my actual data set.

1           2           3           4           5      
57.033      57.0332     57.0333     57.0339     57.03332      
57.033      57.033      57.0335     59.0490     59.04901      
59.0489     59.048      59.0490589  60.0806     60.08   
60.0805     60          60.08       60          60.08059     
60.08053    60.080      60.08       61.0366     61.03947

A second matrix of same structure.

mz2

1       2           3           4           5
17.26   16.95225    17          17.84       17.79
14      141         143         632         629
630     63          631.337     241.5272    239
539     41          413         412         412
41      240         241         640         56

I need to compare the first value in col 1 with all values in all columns and if they match my criteria I should add the value in the first row of that particular column. This will happen in iteration. I then check for the second row in col 1 and match against all values in all columns and if they match the criteria, then add them at row 2 of the particular column.

I tried using for loops but it is quite confusing.

This is my attempt:

x.mz1<-matrix(0,5,5)        
b1.mz=mz[,1]       ##mz is my sample data above        
b2.mz=mz2[,1]    

for (i in length(b1.mz))    
{       
  one.mz=b1.mz[i]    
  one.2=b2.mz[i]    

  for (j in 2:ncol(mz))    
  {    
    two.1=mz[,j]    
    two=mz2[,j]    

   for (k in 1:length(two.1))
   {
  sec.mz=two.1[k]
  sec=two[k]
  cond1[k]<-one.mz-two.1<0.000005
  cond2[k]<-one.2-two<10
  cond.check<-cbind(cond1[k],cond2[k])
  cond.chc<-rbind(cond.check)
  browser()
}
  cond.chk.sum<-apply(cond.chc,1,sum)
  sum.check<-sum(cond.chk.sum==2,na.rm=T)

  if (sum.check==1)
  {
    x.mz1[i,j]=sec.mz
    }

What I tried in my code: I tried to generate a logical matrix from all the iterations and after all rows in col 2 are checked for the criteria, there will be a logical matrix and when the matrix is generated it will be of size 5x2 for both conditions. Then when both conditions are TRUE, I add the col 2 value to row 1 if I am comparing first value in col 1.

I hope it is clear as I am quite confused after trying out all the looping structures back and forth. Is there an easier way to do this without using so many loops? using lappy or some other function.

output: not exact values but to give an idea of what I expect as output.

1               2           3               4               5      
57.03326875     57.03329    0           57.033      57    
57.03329688     0           0           0           59.049   
59.04894556     60.0805     59.049      60          0
60.0805355      0           0           60.080      60.080
60.08053673     61.039281   0           60.09           61.0839

the first col is my col 1 in the main matrix and to this all other columns are calculated. if I find the one value from all the rows that match then i add it to the row and corresponding column from where the value belongs. 0's mean no no value matched for this value in col 1 from all the rows in that column.

Can you "`dput`" a small bit of both matrices and the expected output? Also outline the criteria more. (column difference < .000005, second matrix diff < 10 ?) — Will, Dec 25 '13 at 05:13
mz 1 and mz2 are the smaller sections of the whole matrix. the whole matrix is huge. I am not sure how best I can show the expected output but I shall add it to the Question and if it is not very clear, pls let me know. The condition one is eg: row 1 col 1 in mz1 is 57.033. this value should be compared to all rows in col2 to col5. The data is such that for each column only one value will match both cond 1 and cond2. So 57.033-57.0332 should be <0.000005. — user2698508, Dec 25 '13 at 05:19
Make a small, reproducible example. See http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — Roman Luštrik, Dec 25 '13 at 10:32
To better understand, as an example, you need to find the row and column that `== 2` in -e.g.- the following matrix? `((mz2[1,1] - mz2[,-1]) < 10) + ((mz[1,1] - mz[,-1]) < 0.000005)`. If so, there are more than one `2`s with the sample data you've provided. — alexis_laz, Dec 25 '13 at 13:07

Eric Green · Answer 1 · 2013-12-25T18:48:43.277

Update: I only checked the next column in my first attempt. I made a small revision to the starting data and edited the loop. See value 1.2.

I'm not sure I fully understand your request, but here's an attempt.

# generate data
  v1 <- c(1.2, 5, 9, 13, 17)
  v2 <- c(1, 1.3, 10, 14, 18)
  v3 <- c(2, 6, 1.4, 15, 1.2)
  v4 <- c(3, 7, 11, 1.5, 1.4)
  v5 <- c(4, 8, 12, 16, 1.5)
  dat <- as.data.frame(cbind(v1, v2, v3, v4, v5))

  dat
      v1   v2   v3   v4   v5
  1  1.2  1.0  2.0  3.0  4.0
  2  5.0  1.3  6.0  7.0  8.0
  3  9.0 10.0  1.4 11.0 12.0
  4 13.0 14.0 15.0  1.5 16.0
  5 17.0 18.0  1.2  1.4  1.5

dat2 <- dat
for (r in 1:nrow(dat)) {                    # loop through rows
  for (v in 1:length(dat)) {                # loop through columns
    v.check <- v + 1
      while (v.check < length(dat)) {
        if (dat[r,v] %in% dat[,v.check]==TRUE) {
          dat2[r,v.check] <- dat[r,v]
          v.check <- v
          break
        } else {
          v.check <- v.check + 1
      }
    }
  }
}   

dat2
    v1   v2   v3  v4   v5
1  1.2  1.0  1.2 3.0  4.0
2  5.0  1.3  6.0 7.0  8.0
3  9.0 10.0  1.4 1.4 12.0
4 13.0 14.0 15.0 1.5 16.0
5 17.0 18.0  1.2 1.4  1.5

score 0 · Answer 2 · answered Dec 25 '13 at 17:24

The data.

# data 
m1 <- structure(list(X1 = c(57.033, 57.033, 59.0489, 60.0805, 60.08053
), X2 = c(57.0332, 57.033, 59.048, 60, 60.08), X3 = c(57.0333, 
57.0335, 59.0490589, 60.08, 60.08), X4 = c(57.0339, 59.049, 60.0806, 
60, 61.0366), X5 = c(57.03332, 59.04901, 60.08, 60.08059, 61.03947
)), .Names = c("X1", "X2", "X3", "X4", "X5"), class = "data.frame", row.names = c(NA, 
-5L))

m2 <- structure(list(X1 = c(17.26, 14, 630, 539, 41), X2 = c(16.95225, 
141, 63, 41, 240), X3 = c(17, 143, 631.337, 413, 241), X4 = c(17.84, 
632, 241.5272, 412, 640), X5 = c(17.79, 629, 239, 412, 56)), .Names = c("X1", 
"X2", "X3", "X4", "X5"), class = "data.frame", row.names = c(NA, 
-5L))

Making results of conditions as logical objects.

# first columns
m1.c1 <- m1[,1]
m2.c1 <- m2[,1]

# first condition
res1 <- lapply(m1.c1,FUN=function(x){x-m1[,-1] < 0.00005})
# second condition
res2 <- lapply(m2.c1,FUN=function(x){x-m2[,-1] < 10})

# getting final condition as logical
res <- lapply(seq_along(m1.c1), FUN=function(x)(res1[[x]] & res2[[x]]))

So far, how to make it by lapply.

Now, res object is list having logical answers to your conditions. What to do else with results is not clear, as your idea description and data provided is not exactly consistent.

Matching one value against all values in a dataframe, in iteration based on criteria

2 Answers2