How to Write a RegEx function in R that picks countries that have the letter e but not the string ee

Question

This is my first time using stackoverflow (thanks for the help in advance) I am trying to Write a RegEx function in R that picks countries that have the letter e but not the string ee:

Example: countries <- c("USA", "Lebanon", "Greece", "Mexico")

Desired Output: "Lebanon", "Mexico"

I tried the below code but no luck: str_subset(countries, pattern = "[^ee]e")

Use `^(?!.*ee).*e` – Wiktor Stribiżew Nov 30 '19 at 23:09 — Wiktor Stribiżew, Nov 30 '19 at 23:09

score 1 · Accepted Answer · edited Jun 20 '20 at 09:12

We can make use of the negate argument in str_subset.

library(stringr)
str_subset(countries, pattern = "(?<=(?i)e)((?i)e)|^([^Ee]+)$", 
        negate = TRUE)
#[1] "Lebanon" "Mexico"  "Egypt"   "France"  "FRANCE"

Here, we match the case insensitive ((?i)) 'e' that follows a case insenstive 'e' (showed in the regex lookaround ((?<=) or (|) characters that are not a "E" or "e" from the start (^) to end ($) of the string (essentially matching words with no "E" or "e" character) and use negate = TRUE to reverse the matching words

Or using str_count

countries[str_count(countries, "(?<!e)(?i)e(?!=e)") == 1]
#[1] "Lebanon" "Mexico"  "Egypt"   "France"  "FRANCE"

EDIT: Included some more edges cases as mentioned by @G5W

data

countries <- c("USA", "Lebanon", "Greece", "Mexico", "Egypt", "France", "FRANCE")

@G5W Thanks, just was checking the 'France' part, as i didn't see the update — akrun, Nov 30 '19 at 22:39

How to Write a RegEx function in R that picks countries that have the letter e but not the string ee

1 Answers1

data