3

This is my first time using stackoverflow (thanks for the help in advance) I am trying to Write a RegEx function in R that picks countries that have the letter e but not the string ee:

Example: countries <- c("USA", "Lebanon", "Greece", "Mexico")

Desired Output: "Lebanon", "Mexico"

I tried the below code but no luck: str_subset(countries, pattern = "[^ee]e")

1 Answers1

1

We can make use of the negate argument in str_subset.

library(stringr)
str_subset(countries, pattern = "(?<=(?i)e)((?i)e)|^([^Ee]+)$", 
        negate = TRUE)
#[1] "Lebanon" "Mexico"  "Egypt"   "France"  "FRANCE" 

Here, we match the case insensitive ((?i)) 'e' that follows a case insenstive 'e' (showed in the regex lookaround ((?<=) or (|) characters that are not a "E" or "e" from the start (^) to end ($) of the string (essentially matching words with no "E" or "e" character) and use negate = TRUE to reverse the matching words


Or using str_count

countries[str_count(countries, "(?<!e)(?i)e(?!=e)") == 1]
#[1] "Lebanon" "Mexico"  "Egypt"   "France"  "FRANCE"  

EDIT: Included some more edges cases as mentioned by @G5W

data

countries <- c("USA", "Lebanon", "Greece", "Mexico", "Egypt", "France", "FRANCE")
Community
  • 1
  • 1
akrun
  • 874,273
  • 37
  • 540
  • 662