I have the dataframe below, where each row represents changes in text. I then use the adist()
function to extract whether the change is a match (M), insertion (I), substitution (S) or deletion (D).
I need to find all of the indices of I
s in the change
column (illustrated here in the insrtion_idx
column). Using those indices, I need to extract the corresponding characters in current_text
(illustrated here in insertion_chars
).
df <- tibble(current_text = c("A","AB","ABCD","ABZ"),
previous_text = c("","A","AB","ABCD"),
change = c("I","MI","MMII","MMSD"),
insertion_idx = c(c(1),c(2),c(3,4),""),
insertion_chars = c("A","B","CD",""))
I have tried splitting up strings and comparing string differences, but this gets very messy very fast with real-world data. How do I accomplish the above task?