0

I have a column with strings; I am trying to get counts if an alphabet occurs successively.

j <- data.frame(states= c("AYYOOYYYYYZ", "AYYCCYYYYYZ", "AYYCOOYYCCZ"))

I used str_count; it returns 3 3 2 as counts for YY instead of 5 5 2.
I tried regex, grepexr but couldn't get counts recursively.

str_count(j$states, "YY")
[1] 3 3 2

expected output:

structure(list(states = c("AYYOOYYYYYZ", "AYYCCYYYYYZ", "AYYCOOYYCCZ"
), rec_yy = c(5, 5, 2), rec_oo = c(1, 0, 1), rec_cc = c(0, 1, 
1)), row.names = c(NA, -3L), class = "data.frame")

I appreciate your help!

AndrewGB
  • 16,126
  • 5
  • 18
  • 49
geek v
  • 1
  • 2
    is this a duplicate? https://stackoverflow.com/questions/23840641/count-the-number-of-overlapping-substrings-within-a-string https://stackoverflow.com/questions/7878992/finding-the-indexes-of-multiple-overlapping-matching-substrings – Ben Bolker Jan 05 '22 at 00:39
  • Thanks! I will close the question! – geek v Jan 05 '22 at 00:52

1 Answers1

0
df %>% 
  mutate(rec_yy = str_count(states, '(?=YY)'),
         rec_oo = str_count(states, '(?=OO)'),
         rec_cc = str_count(states, '(?=CC)'))

       states rec_yy rec_oo rec_cc
1 AYYOOYYYYYZ      5      1      0
2 AYYCCYYYYYZ      5      0      1
3 AYYCOOYYCCZ      2      1      1
Onyambu
  • 67,392
  • 3
  • 24
  • 53