3

I'm trying to find all the words with three consecutive double letters, e.g., bookkeeper.

Currently its giving me any word with double letters, rather than three consecutive sets.

This is where I am at:

Collins <- Collins %>%
filter(nchar(test) >= 5)

dbl_letter <- function(word, position){
  substring(word, position, position) == substring(word, position+1, position+1)
  }


for(word in Collins$test){
    for(i in 1:nchar(word)){
      if(dbl_letter(word,i) == TRUE & dbl_letter(word,i+2) == TRUE  & dbl_letter(word,i+4) == TRUE){
        print(word)
      }
    }
  }
clavat245
  • 41
  • 3
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick May 19 '20 at 18:35
  • create a regular expression, and find function which matches strings against a regular expression. – user31264 May 19 '20 at 18:59
  • 1
    in your code you need loop to nchar(word)-5, not nchar(word). – user31264 May 19 '20 at 19:00

1 Answers1

3

Using regular expression could possibly help you out:

word <- c("bookkeeper", "parrot", "oomm", "wordcloud", "oooooo", "aaaaa")
grepl("([A-Za-z])\\1([A-Za-z])\\2([A-Za-z])\\3", word)

grepl returns TRUE or FALSE if there are three consectuive double letters, including six times the same letter. The vector word as defined above gives us

[1]  TRUE FALSE FALSE FALSE  TRUE FALSE

Ignoring the case

The regular expression given above fails when the consectuive double letters are of different cases. So boOkKeEper fails the test. We can solve this by transforming the words to lower cases (or upper cases):

grepl("([A-Za-z])\\1([A-Za-z])\\2([A-Za-z])\\3", tolower(word))

In this case the regular expression simplifies to

grepl("([a-z])\\1([a-z])\\2([a-z])\\3", tolower(word))
Community
  • 1
  • 1
Martin Gal
  • 16,640
  • 5
  • 21
  • 39