4

I have a list of words. I want to count the words that have a certain letter repeatedly appears. I don't mind how many times the letter repeated appears, as long as it appears at least twice. I don't mind if the repetition is adjacent or not. I want to include both "ppa" and "pepa" for example.

fruit <- c("apple", "banana", "pear", "pineapple", "papaya")

Say this is my list. My target letter is "p". I want to count words that have at least two "p". So I want to count "apple", "pineapple", and "papaya". The number I want to obtain is 3.

I've tried

str_detect(fruit, "p[abcdefghijklmmoqrstuvwxyz]p")

But this does not count "apple" and "pineapple". Is there a way to have all three words included?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Ditto
  • 41
  • 2
  • I would guess `str_detect(fruit, "p[a-z]*p")`? There are many, many ways to detect "at least two 'p'" - does that solve your problem? – jared_mamrot Dec 18 '22 at 09:09
  • You missed the [quantifier](https://www.regular-expressions.info/repeat.html) and can use a [range](https://www.regular-expressions.info/charclass.html): [`(?i)p[a-oq-z]*p`](https://regex101.com/r/SKpKzw/1) with `(?i)` [flag](https://www.regular-expressions.info/modifiers.html) to [*ignore case*](https://stackoverflow.com/questions/44530029/how-to-ignore-case-when-using-str-detect). – bobble bubble Dec 18 '22 at 13:08

2 Answers2

4

A non-regex way to approach the problem is to count number of 'p' in fruits. This can be done using str_count function.

library(stringr)
fruit[str_count(fruit, 'p') > 1]
#[1] "apple"     "pineapple" "papaya"   

If you want output as 3, you can sum the output instead of subsetting.

sum(str_count(fruit, 'p') > 1)
#[1] 3

where str_count returns the number of times the pattern is repeated which in our case is 'p'.

str_count(fruit, 'p')
#[1] 2 0 1 3 2
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
2

If you really want to use regex to solve this problem, one of the many ways could be:

p[a-zA-Z]*p

The regex essentially looks for at least two 'p' along with other alphabets. The total number of matches is the expected output you are looking for.

Demo

CinCout
  • 9,486
  • 12
  • 49
  • 67