4

I have string like

'abbb'

I need to understand how many times I can find substring 'bb'.

grep('bb','abbb')

returns 1. Therefore, the answer is 2 (a-bb and ab-bb). How can I count number of occurrences the way I need?

rawr
  • 20,481
  • 4
  • 44
  • 78
Lionir
  • 341
  • 1
  • 8
  • How large is your real problem? (In other words, how important is efficiency?) – Heroka Feb 22 '16 at 20:46
  • the data is small, I'm just curious if it can be realized in R – Lionir Feb 22 '16 at 20:47
  • One possibility would be to go back to http://stackoverflow.com/questions/35561641/find-all-possible-substrings-of-length-n and then `sum(allsubstr('abbb', nchar('bb')) == 'bb')`, **where** my function `allsubstr` no longer uses `unique`. – Julius Vainora Feb 22 '16 at 20:51
  • 1
    similar http://stackoverflow.com/questions/25800042/overlapping-matches-in-r – rawr Feb 22 '16 at 20:53
  • 1
    `table(Vectorize(substr)('abbb', 1:3, 1:3 + 1))` – rawr Feb 22 '16 at 21:02

3 Answers3

7

You can make the pattern non-consuming with '(?=bb)', as in:

length(gregexpr('(?=bb)', x, perl=TRUE)[[1]])
[1] 2
Pierre L
  • 28,203
  • 6
  • 47
  • 69
  • It does work if we grep single substring, but it doesnt if we have to find for multiple substrings and have to use loop. x=c('aa','bb','ba') length(gregexpr('(?=x[2])', x, perl=TRUE)[[1]]) – Lionir Feb 22 '16 at 21:12
  • A loop appears to unavoidable. – Pierre L Feb 22 '16 at 21:17
3

Here is an ugly approach using substr and sapply:

input <- "abbb"

search <- "bb"


res <- sum(sapply(1:(nchar(input)-nchar(search)+1),function(i){
  substr(input,i,i+(nchar(search)-1))==search
}))
Heroka
  • 12,889
  • 1
  • 28
  • 38
1

We can use stri_count

library(stringi)
stri_count_regex(input, '(?=bb)')
#[1] 2

stri_count_regex(x, '(?=bb)')
#[1] 0 1 0

data

input <- "abbb"
x <- c('aa','bb','ba')
akrun
  • 874,273
  • 37
  • 540
  • 662