1

I'm writing a short code which compares two dataframes- list & knownlocation. I want to know for each item in list, whether it falls within a knownlocation.

colnames(list) <- c("gene_symbol", "chromo", "start", "end")
colnames(knownlocation) <- c("snp", "chr", "s", "e")

To find this I wrote the code to make a new column in "list" saying TRUE or FALSE whether it's in any of knownlocation:

for (i in 1:nrow(list)) {
for (j in 1:nrow(knownlocation)) {
if ( (list[i, 2] == knownlocation[j, 2]) && (list[i, 3] >= knownlocation[j, 3]) && (list[i, 4] <= knownlocation[j, 4]) ) {
list[i, 5] = "TRUE" }
else { list[i, 5] = "FALSE"}
}}

This code looks fine to me and it runs with no errors. Problem is the entire list shows FALSE, even if it does fall within a location in knownlocation. Can anyone find something obviously wrong that I'm missing?

zx8754
  • 52,746
  • 12
  • 114
  • 209
ClareFG
  • 65
  • 1
  • 11

1 Answers1

1

Issue is the else clause will overwrite with FALSE on previous instances where TRUE was found. Basically, you only want to store TRUE when condition is met, but initialize all values to FALSE prior to execution of the loop.

Try and remove the else clause.

To initialize column 5 of the list with FALSE do the following right before the the nested looping:

list$V5 = FALSE

Code:

list$V5=TRUE
for (i in 1:nrow(list)) {
  for (j in 1:nrow(knownlocation)) {
    if ( (list[i, 2] == knownlocation[j, 2]) && (list[i, 3] >= knownlocation[j, 3]) && (list[i, 4] <= knownlocation[j, 4]) ) {
      list[i, 5] = "TRUE" }
}
Vince
  • 3,325
  • 2
  • 23
  • 41