2

There are 42 rows in my dataset(EP) and i want to remove the middle entries for participants. The follow code works but its giving an error:

Error in if (EP$Name[row] == EP$Name[row + 1]) { : missing value where TRUE/FALSE needed

for (row in 2:length(EP$Name)){

    if(EP$Name[row] == EP$Name[row+1]) 
    {
        if(EP$Name[row]==EP$Name[row-1])
        {
         EP <- EP[-row,]
          print(row)
        }
    }
}
Darren Tsai
  • 32,117
  • 5
  • 21
  • 51

2 Answers2

2

You are getting an error because of the last possible value of row When row=length(EP$Name) , EP$Name[row+1] is not defined.

Your dataframe is small so it does not really matter here, but it's good to know that you should avoid loops on dataframes in R, you can have a look at this question to see how you could do without a loop here.

fmarm
  • 4,209
  • 1
  • 17
  • 29
0

In R, very rarely you'll need to use for loop explicitly.

Here, it seems you are trying to remove rows where Name is same as the previous row or the next row. You can use lag and lead in dplyr to get previous or next row respectively.

library(dplyr)
EP %>% filter(Name != lag(Name) & Name != lead(Name))

Or in data.table we can use shift :

library(data.table)
setDT(EP)[Name != shift(Name) & Name != shift(Name, type = 'lead')]
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213