1

Can anyone tell me what’s preventing this loop from running?

For each row i, in column 3 of the data frame ‘depth.df’, the loop preforms a mathematical function, using a second data frame, 'linker.df' (it multiplies i by a constant / a value from linker.df which is found by matching the value of i.

If I run the loop for a single instance of i, (lets say its = 50) it runs fine:

cor.depth <- function(depth.df){
result  <- seq(from=1, to=(nrow(depth.df))) 
x <- 8971
for(i in 1:nrow(depth.df)){ 
       result[i] <- depth.df[i,3]*(x /( linker.df [i,2][ linker.df [i,1] == 50]))
    return(result)  
 }
}


>97,331

but if I run it to loop over each instance of i, it always returns an error:

cor.depth <- function(depth.df){
result  <- seq(from=1, to=(nrow(depth.df))) 
x <- 8971
for(i in 1:nrow(depth.df)){ 
       result[i] <- depth.df[i,3]*(x /( linker.df [i,2][ linker.df [i,1] %in% depth.df[i,3]]))
    return(result)  
 }
}

Error in result[i] <- depth.df[i, 3] * (all_SC_bins/(depth.ea.bin.all[, : replacement has length zero

EDIT

Here is a reproducible data set provided to illustrate data structure and issue

    #make some data as an example 
#make some data as an example 
linker.data <- sample(x=40:50, replace = FALSE)

linker.df <- data.frame(
  X = linker.data
,   Y = sample(x=2000:3000, size = 11, replace = TRUE)
)

depth.df <- data.frame(
  X = sample(x=9000:9999, size = 300, replace = TRUE)
  ,   Y = sample(x=c("A","G","T","C"), size = 300, replace = TRUE)
  ,   Z = sample(linker.data, size = 300, replace = TRUE)
)



 cor.depth <- function(depth.df){
  result  <- seq(from=1, to=(nrow(depth.df))) 
  x <- 8971
  for(i in 1:nrow(depth.df)){ 
    result[i] <- depth.df[i,3]*(x /( linker.df [i,2][ linker.df [i,1] %in% depth.df[i,3]]))
    return(result)  
  }
}
  • Probably one of the instances in the data.frame return `NULL`. Meaning only one specific `i` would cause this error and you need to find that. You can do this by printing `i` to the console and see till when for loop is running. – M-- Jun 16 '17 at 21:27
  • Thanks Masoud- That was a good idea but I checked for NULL and NA values on your recommendation and there are no NULL or NA values present in either data frame. – user8173816 Jun 16 '17 at 22:02
  • No. You got me wrong. I meant within the for loop, your mathematical function return `NULL` or something. Add: `print(i)` at the first line in your for loop and see what's the last `i` that gets printed. – M-- Jun 16 '17 at 22:04
  • You can also provide a reproducible example. Please read [How to make a great reproducible example in R?](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – M-- Jun 16 '17 at 22:05
  • Thanks Masoud! I have edited the post to include a reproducible example. Adding print(i) to the loop creates a list with the same number of iterations as nrow(depth.df), but I fear I may be misunderstanding you still. I can't get it to print any output at all except for the empty seq() list, unless i put the values in manually (as in the example with == 50). – user8173816 Jun 16 '17 at 23:47

1 Answers1

0

Error emerges because denominator returns integer(0) or numeric(0) or a FALSE result on most rows. Your loop attempts to find exact row number, i, where both dataframes' respective X and Z match. Likely, you intended where any of the rows match which would entail using a second, nested loop with an if conditional on matches.

cor.depth <- function(depth.df){
  result  <- seq(from=1, to=(nrow(depth.df))) 
  x <- 8971
  for(i in 1:nrow(depth.df)){ 
    for (j in 1:nrow(linker.df)){
      if (linker.df[j,1] == depth.df[i,3]) {
          result[i] <- depth.df[i,3]*(x /( linker.df[j,2]))
      }
    }
  }
  return(result)
}

Nonetheless, consider merge a more efficient, vectorized approach which matches any rows between both sets on ids. The setNames below renames columns to avoid duplicate headers:

mdf <- merge(setNames(linker.df, paste0(names(linker.df), "_l")), 
             setNames(depth.df, paste0(names(depth.df), "_d")), 
                      by.x="X_l", by.y="Z_d")

mdf$result <- mdf$X_l * (8971 / mdf$Y_l)

And as comparison, the two approaches would be equivalent:

depth.df$result <- cor.depth(depth.df)

depth.df <- with(depth.df, depth.df[order(Z),])   # ORDER BY Z    
mdf <- with(mdf, mdf[order(X_l),])                # ORDER BY X_L

all.equal(depth.df$result, mdf$result)
# [1] TRUE
Parfait
  • 104,375
  • 17
  • 94
  • 125