2

I have a dataframe in the following format with ID's and A/B's. The dataframe is very long, over 3000 ID's.

id type
1 A
2 B
3 A
4 A
5 B
6 A
7 B
8 A
9 B
10 A
11 A
12 A
13 B
... ...

I need to remove all rows (A+B), where more than one A is behind another one or more. So I dont want to remove the duplicates. If there are a duplicate (2 or more A's), i want to remove all A's and the B until the next A.

id type
1 A
2 B
6 A
7 B
8 A
9 B
... ...

Do I need a loop for this problem? I hope for any help,thank you!

Flow91
  • 63
  • 6

2 Answers2

1

This might be what you want:

First, define a function that notes the indices of what you want to remove:

row_sequence <- function(value) {
  inds <- which(value == lead(value))  
  sort(unique(c(inds, inds + 1, inds +2)))
}

Apply the function to your dataframe by first extracting the rows that you want to remove into df1 and second anti_joining df1 with df to obtain the final dataframe:

library(dplyr)
df1 <- df %>% slice(row_sequence(type))
df2 <- df %>%
  anti_join(., df1)

Result:

df2
  id type
1  1    A
2  2    B
3  6    A
4  7    B
5  8    A
6  9    B

Data:

df <- data.frame(
  id = 1:13,
  type = c("A","B","A","A","B","A","B","A","B","A","A","A","B")
)
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34
1

I imagined there is only one B after a series of duplicated A values, however if that is not the case just let me know to modify my codes:

library(dplyr)
library(tidyr)
library(data.table)

df %>%
  mutate(rles = data.table::rleid(type)) %>%
  group_by(rles) %>%
  mutate(rles = ifelse(length(rles) > 1, NA, rles)) %>%
  ungroup() %>%
  mutate(rles = ifelse(!is.na(rles) & is.na(lag(rles)) & type == "B", NA, rles)) %>%
  drop_na() %>%
  select(-rles)

# A tibble: 6 x 2
     id type 
  <int> <chr>
1     1 A    
2     2 B    
3     6 A    
4     7 B    
5     8 A    
6     9 B 

Data

df <- read.table(header = TRUE, text = "
                 id     type
1   A
2   B
3   A
4   A
5   B
6   A
7   B
8   A
9   B
10  A
11  A
12  A
13  B")
Anoushiravan R
  • 21,622
  • 3
  • 18
  • 41
  • 1
    Ok perhaps I misunderstood the question. Your approach is correct given the situation here – AnilGoyal Apr 30 '21 at 13:52
  • I assumed there is only one `B` after a possible series of `A`s. But if that's not the case this approach will need some modifications. – Anoushiravan R Apr 30 '21 at 13:56