What's happening here is that when you remove a row in this way, all the rows below it "move up" to fill the space left behind. When there are repeated rows that should be deleted, the second one gets skipped over. Imagine this table:
1 keep
2 delete
3 delete
4 keep
Now, you loop through a sequence from 1 to 4 (the number of rows) deleting rows that say delete:
i = 1
, keep that row ...
i = 2
, delete that row. Now, the data frame looks like this:
1 keep
2 delete
3 keep
i = 3
, the 3rd row says keep, so keep it ... The final table is:
1 keep
2 delete
3 keep
In your example with while
, however, the deletion step keeps running on row 2 until that row doesn't meet the conditions instead of moving on to i = 3
right away. So the process goes:
i = 1
, keep that row ...
i = 2
, delete that row. Now, the data frame looks like this:
1 keep
2 delete
3 keep
i = 2
(again), delete that row (again). Now, the data frame looks like this:
1 keep
2 keep
i = 2
(again), this row says keep, so keep it and move on to i = 3
I'd be remiss to answer this question without mentioning that there are much better ways to do this in R such as square bracket notation (enter ?`[`
in the R console), the filter
function in the dplyr
package, or the data.table
package.
This question has many options: Filter data.frame rows by a logical condition