0

When writing regular expressions with R's gsub() function I thought that I could use capture groups by enclosing patterns in (...), and referring to the captured patterns with \\1 to "grab" them.

However, that doesn't appear to work in this example:

> gsub("([^-]+$)", "\\1", "xxx-yyy-zzz-abc")
[1] "xxx-yyy-zzz-abc"

The actual output is xxx-yyy-zzz-abc when I expected it to be abc. The regular expression above ([^-]+$) captures the string after the last hyphen -, and I confirmed it here with this regex demo.

Why isn't my output abc?

If I instead remove my captured pattern from the original string, everything works as expected, and abc is removed from original string.

> gsub("([^-]+$)", "", "xxx-yyy-zzz-abc")
[1] "xxx-yyy-zzz-"

What do I populate the code with below to get the output abc? And what went wrong?

> gsub("______", "______", "xxx-yyy-zzz-abc")
[1] "abc"
Display name
  • 4,153
  • 5
  • 27
  • 75
  • 1
    You *replace* matches with `gsub`. Use `regmatches` to *extract* ([example](https://stackoverflow.com/a/23901600/3832970)) – Wiktor Stribiżew Jan 08 '20 at 19:18
  • @Wiktor Stribiżew but in the question you reference there's an answer by @Ragy Isaac that seems to mimic my approach, which still leaves me wondering why my `gsub("([^-]+$)", "\\1", "xxx-yyy-zzz-abc")` doesn't functon as _I_ expect. @Ragy Isaac's solution to the example question was `gsub('([[:alpha:]]+)([0-9]+)([[:alpha:]]+)', '\\2', "aaa12xxx")` which does indeed behave as expected. The parenthesis `()` capture the string and the `\\2` spits it out. No? – Display name Jan 08 '20 at 19:36
  • 1
    It does not contradict the main idea: `gsub` / `sub` *replaces* matches. `[^-]+$` is a *matching* pattern since you want to get `abc` from `xxx-yyy-zzz-abc`. This answers your *Why isn't my output abc* question. If you want to use `gsub` to remove all unwanted parts of a string to get the part you want, go ahead. – Wiktor Stribiżew Jan 08 '20 at 19:37
  • 1
    You are matching only on the pattern you are replacing, which means the result will be the same. Match on the entire line instead: gsub(".*-([^-]+)$", "\\1", "xxx-yyy-zzz-abc") – BigFinger Jan 08 '20 at 23:46
  • @BigFinger I ended up using `gsub("^.*-", "", "xxx-yyy-zzz-abc")` but that's a great idea as well. – Display name Jan 09 '20 at 19:07

0 Answers0