0

I have a text.

x = "(F1) sample text (F2) (F3) (S3)"

I want the below output.

"(F1)" "(F2)" "(F3)"

I know how to extract it via stringr package.

library(stringr)
str_extract_all(x, '\\(F[1-3]\\)')

I am curious how to implement negation of pattern in gsub( ). Some of the programming languages support (?! for inverse of regex pattern but it does not work in R. gsub("\\(F[1-3]\\)",'',x)

alistaire
  • 42,459
  • 4
  • 77
  • 117
john
  • 1,026
  • 8
  • 19
  • `gsub` makes substitutions with a vector of strings, but always returns the same number as you pass in. It's parallel to `stringr::str_replace_all`. – alistaire Jun 17 '18 at 16:13
  • Thanks. I want to know how to make gsub( ) understand that I want replacement of 'negation / inverse' of pattern. – john Jun 17 '18 at 16:15
  • The usual strategy is to either use capture groups to extract what you want, e.g. `gsub('\\D*(\\d+)\\D*', '\\1', 'foo1234bar')` or sub out what you don't: `gsub('\\D', '', 'foo1234bar')` – alistaire Jun 17 '18 at 16:22
  • 1
    For this particular case, `strsplit` is more useful: `x_split <- strsplit(x, ' [^(].*[^)] |\\s')[[1]]; grep('F', x_split, value = TRUE)` Technically the equivalent of `str_extract_all` is the combination of `gregexpr` and `regmatches`: `regmatches(x, gregexpr('\\(F[1-3]\\)', x))[[1]]`...but people don't use that much, as far as I've seen – alistaire Jun 17 '18 at 16:24
  • @WiktorStribiżew It's mentioned in the question. Same what str_extract_all returns i.e. (F1) (F2) (F3) – john Jun 17 '18 at 16:24
  • Thanks @alistaire . Appreciate it! Actually I just want to learn negation/inverse of pattern which can be implemented in other programming languages. – john Jun 17 '18 at 16:26
  • 1
    Ok, but what is the pattern you want to negate? There is a way to match everything but some text, but what text do you want to "unmatch"? Actually, most will ask, why??? – Wiktor Stribiżew Jun 17 '18 at 16:30
  • You're asking for something that doesn't exist quite as such in regex. There are ways to negate smaller pieces, e.g. character classes, and some regex functions like `grep` have an `invert` parameter, but there is no structural way to invert a match. `(?!...)` is a negative lookahead, which has to be preceded by a positive match. To get it in base R regex, you have to specify `perl = TRUE`. – alistaire Jun 17 '18 at 16:30
  • @alistaire https://regex101.com/r/cO8lqs/20 I am unable to replicate it in R. x = "third drone" gsub('d(?!r)','', x) – john Jun 17 '18 at 16:34
  • Literally, `regmatches('third drone', regexpr('d(?!r)', 'third drone', perl = TRUE))`; with `sub`/`gsub` `sub('.*(d)(?!r).*', '\\1', 'third drone', perl = TRUE)`, though you don't really need the lookahead if you're deleting the rest anyway: `sub('.*(d)r.*', '\\1', 'third drone')` – alistaire Jun 17 '18 at 16:39
  • `str_` methods allow lookarounds as ICU regex library supports them natively. – Wiktor Stribiżew Jun 17 '18 at 16:40

0 Answers0