1

How to get all possible combinations from the following regex pattern?

"(ost|west)europäisch(*$|e(*$|r|s|n|m))"

I would like getting a vector that looks like this:

[1] "osteuropäisch"    "westeuropäisch"   "osteuropäische"   "westeuropäische" 
[5] "osteuropäischer"  "westeuropäischer" "osteuropäisches"  "westeuropäisches"
[9] "osteuropäischen"  "westeuropäischen" "osteuropäischem"  "westeuropäischem"

From the following question, I understand that I can get all combinations with the following function:

do.call(paste0, expand.grid(
   c("ost","west"),
   "europäisch",
   c("", paste0("e", c("", "r", "s", "n", "m"))))
)

However, I have a large number of different regex patterns that I need to convert into full strings. Therefore, I was wondering if there is anywhere a function or package for R that can transform regex expressions into a vector of all possible combinations.

So far I have not found any explicit function in base, stringi or stringr

From a similar question on regex combinations in python, I know that for python the exrex module exists. I was thinking that maybe something similar exists for R, which I am not able to find?

xyz
  • 134
  • 1
  • 12
  • 2
    To answer this question it would be great to see what "other different regex patterns" you have, and in which format, and why the approach above is not easily working in your case. – TimTeaFan Mar 17 '23 at 12:41

1 Answers1

1

This almost works for your example - you could easily encapsulate it in a function. I don't know if it will break for different/more exotic examples ...

(This doesn't quite work right for the "*$" before "e": e.g. is "westeuropäischm", with no "e" between "h" and "m" shouldn't be an option?

str <- "(ost|west)europäisch(*$|e(*$|r|s|n|m))"
s1 <- strsplit(str, "[()]")[[1]]
s1 <- s1[nzchar(s1)]
s2 <- strsplit(s1, "\\|")
s2 <- lapply(s2, gsub, pattern = "\\*|\\$", replacement = "")
s3 <- do.call(expand.grid, s2)
res <- apply(s3, 1, paste, collapse = "")

Check:

grepl(str, res)

(doesn't quite work yet)

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453