0

Consider this table

var1    var2        var3
565 P0049129/21     146
565 P0020151/04     146

I would like to go over this table, find consecutive lines where var3 equals the same value (146 in this example) and remove one of those lines.

Note that in this table there are other rows where var3=146 and I want to keep those rows. I just want to remove duplication when var3 has the same value on two consecutive rows.

Thanks

Filipe Dias
  • 284
  • 1
  • 10

3 Answers3

2

We can use rleid and then duplicated.

df[!duplicated(data.table::rleid(df$var3)), ]

This will keep only the first row of consecutive values and delete the rest.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
1

We can use rleid to find the groups that are same

library(data.table)
i1 <- setDT(df1)[,  .I[1],rleid(var3)]$V1
df1[i1]
#     var1        var2 var3
#@1:  565 P0049129/21  146

Or another option is

library(dplyr)
df1 %>%
   group_by(grp = cumsum(var3 != lag(var3, default = first(var3)))) %>%
   slice(1) %>%
   ungroup %>%
   select(-grp)
# A tibble: 1 x 3
#   var1 var2   var3
#  <int> <chr>       <int>
#1   565 P0049129/21   146

Or we can do this in. base R

grp <- with(rle(df$var3), rep(seq_along(values), lengths))
subset(df, !duplicated(grp))

data

df <- structure(list(var1 = c(565L, 565L), var2 = c("P0049129/21", 
"P0020151/04"), var3 = c(146L, 146L)), class = "data.frame", row.names = c(NA, 
-2L))
akrun
  • 874,273
  • 37
  • 540
  • 662
1
library(data.table)
setDT(df)

df[rowid(rleid(var3)) == 1]
IceCreamToucan
  • 28,083
  • 2
  • 22
  • 38