How can I remove rows with the same value in 2 ore more rows in R

Question

I have a dataframe in the following format with ID's and A/B's. The dataframe is very long, over 3000 ID's.

id	type
1	A
2	B
3	A
4	A
5	B
6	A
7	B
8	A
9	B
10	A
11	A
12	A
13	B
...	...

I need to remove all rows (A+B), where more than one A is behind another one or more. So I dont want to remove the duplicates. If there are a duplicate (2 or more A's), i want to remove all A's and the B until the next A.

id	type
1	A
2	B
6	A
7	B
8	A
9	B
...	...

Do I need a loop for this problem? I hope for any help,thank you!

Is there only one `B` after a series of `A`s or it could be more? — Anoushiravan R, Apr 30 '21 at 13:36
You should use code to provide the example, such as `dput`. See https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — Peace Wang, Apr 30 '21 at 13:36

Chris Ruehlemann · Accepted Answer · 2021-04-30T13:55:18.663

This might be what you want:

First, define a function that notes the indices of what you want to remove:

row_sequence <- function(value) {
  inds <- which(value == lead(value))  
  sort(unique(c(inds, inds + 1, inds +2)))
}

Apply the function to your dataframe by first extracting the rows that you want to remove into df1 and second anti_joining df1 with df to obtain the final dataframe:

library(dplyr)
df1 <- df %>% slice(row_sequence(type))
df2 <- df %>%
  anti_join(., df1)

Result:

Data:

df <- data.frame(
  id = 1:13,
  type = c("A","B","A","A","B","A","B","A","B","A","A","A","B")
)

score 1 · Answer 2 · answered Apr 30 '21 at 13:46

I imagined there is only one B after a series of duplicated A values, however if that is not the case just let me know to modify my codes:

library(dplyr)
library(tidyr)
library(data.table)

df %>%
  mutate(rles = data.table::rleid(type)) %>%
  group_by(rles) %>%
  mutate(rles = ifelse(length(rles) > 1, NA, rles)) %>%
  ungroup() %>%
  mutate(rles = ifelse(!is.na(rles) & is.na(lag(rles)) & type == "B", NA, rles)) %>%
  drop_na() %>%
  select(-rles)

# A tibble: 6 x 2
     id type 
  <int> <chr>
1     1 A    
2     2 B    
3     6 A    
4     7 B    
5     8 A    
6     9 B

Data

df <- read.table(header = TRUE, text = "
                 id     type
1   A
2   B
3   A
4   A
5   B
6   A
7   B
8   A
9   B
10  A
11  A
12  A
13  B")

Ok perhaps I misunderstood the question. Your approach is correct given the situation here — AnilGoyal, Apr 30 '21 at 13:52
I assumed there is only one `B` after a possible series of `A`s. But if that's not the case this approach will need some modifications. — Anoushiravan R, Apr 30 '21 at 13:56

How can I remove rows with the same value in 2 ore more rows in R

2 Answers2