3

A user asked me how to do this in How to italicize select words in a ggplot legend?, and I'm not happy with my workaround.

The aim is to add enclosing * around all character vector elements except for given strings. Let's assume for this example that those would always be found at the beginning. I am using an optional capture for the first group and then include the second group with the asterisks. The problem arises when the searched word stands alone and there is no following string.

I've included the desired output and some attempts in the code.

v <- head(rownames(mtcars))
## does also not work with (.*)?, nor with (.+) nor (.+)?
gsub("(Hornet |Valiant)?(.*)", "\\1\\*\\2\\*", v) 
#> [1] "*Mazda RX4*"         "*Mazda RX4 Wag*"     "*Datsun 710*"       
#> [4] "Hornet *4 Drive*"    "Hornet *Sportabout*" "Valiant**"

## desired output
ifelse(grepl("Valiant", v), v, gsub("(Hornet )?(.*)", "\\1\\*\\2\\*", v) )
#> [1] "*Mazda RX4*"         "*Mazda RX4 Wag*"     "*Datsun 710*"       
#> [4] "Hornet *4 Drive*"    "Hornet *Sportabout*" "Valiant"
tjebo
  • 21,977
  • 7
  • 58
  • 94

3 Answers3

3

Neither of the regex engines that can be used with gsub support a conditional replacement pattern.

You can use

v <- c("Mazda RX4","Mazda RX4 Wag","Datsun 710","Hornet 4 Drive","Hornet Sportabout","Valiant")
gsub("^(?:Hornet|Valiant)\\s*(*SKIP)(*F)|(.+)", "*\\1*", v, perl=TRUE)

See the regex demo and the R demo online.

Output:

[1] "*Mazda RX4*"         "*Mazda RX4 Wag*"     "*Datsun 710*"       
[4] "Hornet *4 Drive*"    "Hornet *Sportabout*" "Valiant"   

To make sure the first words are matched as whole words add \b: "^(?:Hornet|Valiant)\\b\\s*(*SKIP)(*F)|(.+)".

Make sure to use the perl=TRUE.

Regex details:

  • ^(?:Hornet|Valiant)\s*(*SKIP)(*F) - match Hornet or Valiant at the start of the string, then zero or more whitespaces, and once matched, discard and fail the match, and proceed to look for the next match from the failure position
  • | - or
  • (.+) - matches one or more chars other than line break chars as many as possible (the rest of the string).
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
3

One more solution is to use possessive quantifier for first group and one-or-more inside of second:

^(Hornet ?|Valiant ?)?+(.+)

This way if Hornet or Valiant were matched in the beginning of the string - no backtracking will occur, and string will be matched (ans subsequently substituted) only if there is something after those.

Demo here.

markalex
  • 8,623
  • 2
  • 7
  • 32
2

Less in depth and more hacky answer, but easier to understand one)

gsub executes substitution only when string matches provided regex. So to stop * from appearing you can make regex stop matching your input.

For example provided in question you can do it with negative lookahead. Result would look like this:

^(?!(?:Hornet|Valiant)$)(Hornet|Valiant)?(.*)$

Demo here.

markalex
  • 8,623
  • 2
  • 7
  • 32