str_extract_all not working with regex, R

Question

I´m trying this

string  <- ":  FC Relacionado con Paciente  FC Protocolo1  FC Comunicacion entre Profesionales1  FC Disponibilidad"


str_extract_all(string, "FC.*1")

It gives back this result:

[1] "FC Relacionado con Paciente  FC Protocolo1  FC Comunicacion entre Profesionales1"

What I want is this:

[1] FC Protocolo1  FC Comunicacion entre Profesionales1"

What should I change?

score 4 · Answer 1 · answered Mar 30 '23 at 08:46

You can use

library(stringr)
string  <- ":  FC Relacionado con Paciente  FC Protocolo1  FC Comunicacion entre Profesionales1  FC Disponibilidad"
str_extract_all(string, "\\bFC\\b(?:(?!\\bFC\\b).)*?1")

See the R demo. Output:

[[1]]
[1] "FC Protocolo1"                       
[2] "FC Comunicacion entre Profesionales1"

See the regex demo. Note: if there must be no digit/letter/underscore after 1, add another \b there.

Pattern details

\bFC\b - a whole word FC
(?:(?!\bFC\b).)*? - any single char other than line break chars (if you need to match across line breaks, add (?s) at the start of the pattern), zero or more but as few as possible occurrences, that does not start a whole word FC char sequence
1 - a 1 char.

Always learn a lot from your regex answers, +1! – ThomasIsCoding Mar 30 '23 at 09:08 — ThomasIsCoding, Mar 30 '23 at 09:08

Mohamed Desouky · Answer 2 · 2023-03-30T09:59:53.157

2

You can use str_split and str_detect as follows

s <- str_split(gsub('(FC)', '_\\1', string), '_')[[1]]

s[str_detect(s, 'FC.*1')]

Output

[1] "FC Protocolo1  "                       
[2] "FC Comunicacion entre Profesionales1  "

If the string is in a variable of a tibble with name string you can use

s <- str_split(gsub('(FC)', '_\\1', string), '_')

new_column <- sapply(s, \(x) paste0(x[str_detect(x, 'FC.*1')], collapse = ''))

edited Mar 30 '23 at 09:59

answered Mar 30 '23 at 09:11

Mohamed Desouky

4,340
2
4
19

See the above update . – Mohamed Desouky Mar 30 '23 at 10:00

score 1 · Answer 3 · answered Mar 30 '23 at 09:07

Another option might be grep + strsplit (but I would say my solution is not as advanced or efficient as @Wiktor's)

> grep("FC.*1", strsplit(string, "\\s+(?=FC)", perl = TRUE)[[1]], value = TRUE)
[1] "FC Protocolo1"
[2] "FC Comunicacion entre Profesionales1"

str_extract_all not working with regex, R

3 Answers3