Match letters in R regex

Question

Suppose I run the following

txt <- "client:A, field:foo, category:bar"
grep("field:[A-z]+", txt, value = TRUE, perl = TRUE)

Based on regexr.com I expected I would get field:foo, but instead I get the entire string. Why is this?

Wiktor Stribiżew · Accepted Answer · 2017-08-22T06:08:24.763

6

You seem to want to extract the value. Use regmatches:

txt <- "client:A, field:foo, category:bar"
regmatches(txt, regexpr("field:[[:alpha:]]+", txt))
# => [1] "field:foo"

See the R demo.

To match multiple occurrences, replace regexpr with gregexpr.

Or use stringr str_extract_all:

library(stringr)
str_extract_all(text, "field:[a-zA-Z]+")

Another point is that [A-z] matches more than ASCII letters. Use [[:alpha:]] in a TRE (regexpr / gregexpr with no perl=TRUE)/ICU (stringr) regex to match any letter.

edited Aug 22 '17 at 06:08

answered Aug 21 '17 at 12:11

Wiktor Stribiżew

607,720
39
448
563

This works very nice, but I still don't understand why the original attempt doesn't work? – T'n'E Aug 21 '17 at 12:28
@T'n'E In your code, you use [`grep`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/grep.html). This function returns character vectors that match (or do not match if you invert the operation) the pattern. It does not *extract* the matches from the character vectors. – Wiktor Stribiżew Aug 21 '17 at 12:32
1

Ah, so I misunderstood the value to parameter to extract the value of the _match_, not the _matched string_. Confusing I think, but got it - thanks! – T'n'E Aug 21 '17 at 13:03

Match letters in R regex

1 Answers1

Linked