2

Suppose I have the following two strings and want to use grep to see which match:

business_metric_one
business_metric_one_dk
business_metric_one_none
business_metric_two
business_metric_two_dk
business_metric_two_none

And so on for various other metrics. I want to only match the first one of each group (business_metric_one and business_metric_two and so on). They are not in an ordered list so I can't index and have to use grep. At first I thought to do:

.*metric.*[^_dk|^_none]$

But this doesn't seem to work. Any ideas?

oguz ismail
  • 1
  • 16
  • 47
  • 69
vashts85
  • 1,069
  • 3
  • 14
  • 28
  • What is the criterion here to exclude a value? If it ends with `_dk` or `_none`? But still contains `metric`? – Wiktor Stribiżew Oct 23 '17 at 21:19
  • There's an inclusion criterion (needs to include `metric` or some string), but it should fail if it includes either `_dk` or `_none`. – vashts85 Oct 23 '17 at 21:20

2 Answers2

2

You need to use a PCRE pattern to filter the character vector:

x <- c("business_metric_one","business_metric_one_dk","business_metric_one_none","business_metric_two","business_metric_two_dk","business_metric_two_none")
grep("metric(?!.*_(?:dk|none))", x, value=TRUE, perl=TRUE)
## => [1] "business_metric_one" "business_metric_two"

See the R demo

The metric(?!.*(?:_dk|_none)) pattern matches

  • metric - a metric substring
  • (?!.*_(?:dk|none)) - that is not followed with any 0+ chars other than line break chars followed with _ and then either dk or none.

See the regex demo.

NOTE: if you need to match only such values that contain metric and do not end with _dk or _none, use a variation, metric.*$(?<!_dk|_none) where the (?<!_dk|_none) negative lookbehind fails the match if the string ends with either _dk or _none.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Two questions: can you explain what a PCRE pattern is? Not sure what the acronym even stands for. And, can you explain some of the notation in your regex like the parentheses, ?!, and so on? I'd love this to be a guide for others about lookahead negations. – vashts85 Oct 24 '17 at 16:36
  • @vashts85 PCRE stands for a Perl Compatible Regular Expression. See http://pcre.org. The answer won't become a guide as it would be too broad. You may learn [more about lookarounds at regular-expressions.info](https://www.regular-expressions.info/lookaround.html). Here is an [SO thread](https://stackoverflow.com/questions/2973436/regex-lookahead-lookbehind-and-atomic-groups) about them. – Wiktor Stribiżew Oct 24 '17 at 16:58
1

You can also do something like this:

grep("^([[:alpha:]]+_){2}[[:alpha:]]+$", string, value = TRUE)
# [1] "business_metric_one" "business_metric_two"

or use grepl to match dk and none, then negate the logical when you're indexing the original string:

string[!grepl("(dk|none)", string)]
# [1] "business_metric_one" "business_metric_two"

more concisely:

string[!grepl("business_metric_[[:alpha:]]+_(dk|none)", string)]
# [1] "business_metric_one" "business_metric_two"

Data:

string = c("business_metric_one","business_metric_one_dk","business_metric_one_none","business_metric_two","business_metric_two_dk","business_metric_two_none")
acylam
  • 18,231
  • 5
  • 36
  • 45