Count number of occurrences when string contains substring

Question

I have string like

'abbb'

I need to understand how many times I can find substring 'bb'.

grep('bb','abbb')

returns 1. Therefore, the answer is 2 (a-bb and ab-bb). How can I count number of occurrences the way I need?

How large is your real problem? (In other words, how important is efficiency?) — Heroka, Feb 22 '16 at 20:46
the data is small, I'm just curious if it can be realized in R — Lionir, Feb 22 '16 at 20:47
One possibility would be to go back to http://stackoverflow.com/questions/35561641/find-all-possible-substrings-of-length-n and then `sum(allsubstr('abbb', nchar('bb')) == 'bb')`, **where** my function `allsubstr` no longer uses `unique`. — Julius Vainora, Feb 22 '16 at 20:51
similar http://stackoverflow.com/questions/25800042/overlapping-matches-in-r — rawr, Feb 22 '16 at 20:53

score 7 · Answer 1 · answered Feb 22 '16 at 20:50

7

You can make the pattern non-consuming with '(?=bb)', as in:

length(gregexpr('(?=bb)', x, perl=TRUE)[[1]])
[1] 2

answered Feb 22 '16 at 20:50

Pierre L

It does work if we grep single substring, but it doesnt if we have to find for multiple substrings and have to use loop. x=c('aa','bb','ba') length(gregexpr('(?=x[2])', x, perl=TRUE)[[1]]) – Lionir Feb 22 '16 at 21:12
A loop appears to unavoidable. – Pierre L Feb 22 '16 at 21:17

score 3 · Accepted Answer · answered Feb 22 '16 at 20:49

3

Here is an ugly approach using substr and sapply:

input <- "abbb"

search <- "bb"


res <- sum(sapply(1:(nchar(input)-nchar(search)+1),function(i){
  substr(input,i,i+(nchar(search)-1))==search
}))

answered Feb 22 '16 at 20:49

Heroka

score 1 · Answer 3 · answered Feb 23 '16 at 04:02

1

We can use stri_count

library(stringi)
stri_count_regex(input, '(?=bb)')
#[1] 2

stri_count_regex(x, '(?=bb)')
#[1] 0 1 0

input <- "abbb"
x <- c('aa','bb','ba')

answered Feb 23 '16 at 04:02

akrun

3 Answers3