0

I am writing a for loop to delete rows in which all of the values between rows 5 and 8 is 'NA'. However, it only deletes SOME of the rows. When I do a while loop, it deletes all of the rows, but I have to manually end it (i.e. it is an infinite loop...I also have no idea why)

The for/if loop:

     for(i in 1:nrow(df)){
if(is.na(df[i,5]) && is.na(df[i,6]) && 
    is.na(df[i,7]) && is.na(df[i,8])){
  df<- df[-i,]
}
  }

while loop (but it is infinite):

 for(i in 1:nrow(df)){
    while(is.na(df[i,5]) && is.na(df[i,6]) && 
        is.na(df[i,7]) && is.na(df[i,8])){
      df<- df[-i,]
    }
      }

Can someone help? Thanks!

Jasmine
  • 1
  • 1
  • 2

2 Answers2

1

What's happening here is that when you remove a row in this way, all the rows below it "move up" to fill the space left behind. When there are repeated rows that should be deleted, the second one gets skipped over. Imagine this table:

1 keep
2 delete
3 delete
4 keep

Now, you loop through a sequence from 1 to 4 (the number of rows) deleting rows that say delete:

i = 1, keep that row ...

i = 2, delete that row. Now, the data frame looks like this:

1 keep
2 delete
3 keep

i = 3, the 3rd row says keep, so keep it ... The final table is:

1 keep
2 delete
3 keep

In your example with while, however, the deletion step keeps running on row 2 until that row doesn't meet the conditions instead of moving on to i = 3 right away. So the process goes:

i = 1, keep that row ...

i = 2, delete that row. Now, the data frame looks like this:

1 keep
2 delete
3 keep

i = 2 (again), delete that row (again). Now, the data frame looks like this:

1 keep
2 keep

i = 2 (again), this row says keep, so keep it and move on to i = 3


I'd be remiss to answer this question without mentioning that there are much better ways to do this in R such as square bracket notation (enter ?`[` in the R console), the filter function in the dplyr package, or the data.table package.

This question has many options: Filter data.frame rows by a logical condition

divibisan
  • 11,659
  • 11
  • 40
  • 58
1

Store the row number in a vector and remove outside the loop.

test <- iris
test[1:5,2:4] <- NA

> head(test)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1          NA           NA          NA  setosa
2          4.9          NA           NA          NA  setosa
3          4.7          NA           NA          NA  setosa
4          4.6          NA           NA          NA  setosa
5          5.0          NA           NA          NA  setosa
6          5.4         3.9          1.7         0.4  setosa

x <- 0


for(i in 1:nrow(test)){
if(is.na(test[i,2]) && is.na(test[i,3]) && 
    is.na(test[i,4])){
  x <- c(x,i)
  }
}
  x
  test<- test[-x,]
  head(test) 

> head(test)
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
6           5.4         3.9          1.7         0.4  setosa
7           4.6         3.4          1.4         0.3  setosa
8           5.0         3.4          1.5         0.2  setosa
9           4.4         2.9          1.4         0.2  setosa
10          4.9         3.1          1.5         0.1  setosa
11          5.4         3.7          1.5         0.2  setosa
João
  • 56
  • 5