3

Suppose I run the following

txt <- "client:A, field:foo, category:bar"
grep("field:[A-z]+", txt, value = TRUE, perl = TRUE)

Based on regexr.com I expected I would get field:foo, but instead I get the entire string. Why is this?

T'n'E
  • 598
  • 5
  • 17

1 Answers1

6

You seem to want to extract the value. Use regmatches:

txt <- "client:A, field:foo, category:bar"
regmatches(txt, regexpr("field:[[:alpha:]]+", txt))
# => [1] "field:foo"

See the R demo.

To match multiple occurrences, replace regexpr with gregexpr.

Or use stringr str_extract_all:

library(stringr)
str_extract_all(text, "field:[a-zA-Z]+")

Another point is that [A-z] matches more than ASCII letters. Use [[:alpha:]] in a TRE (regexpr / gregexpr with no perl=TRUE)/ICU (stringr) regex to match any letter.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • This works very nice, but I still don't understand why the original attempt doesn't work? – T'n'E Aug 21 '17 at 12:28
  • @T'n'E In your code, you use [`grep`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/grep.html). This function returns character vectors that match (or do not match if you invert the operation) the pattern. It does not *extract* the matches from the character vectors. – Wiktor Stribiżew Aug 21 '17 at 12:32
  • 1
    Ah, so I misunderstood the value to parameter to extract the value of the _match_, not the _matched string_. Confusing I think, but got it - thanks! – T'n'E Aug 21 '17 at 13:03