49

I am trying to remove a parenthesis from a string in R and run into the following error:

string <- "log(M)"
gsub("log", "", string) # Works just fine
gsub("log(", "", string) #breaks
# Error in gsub("log(", "", test) : 
#   invalid regular expression 'log(', reason 'Missing ')''
M--
  • 25,431
  • 8
  • 61
  • 93
mike
  • 22,931
  • 31
  • 77
  • 100

3 Answers3

74

Escape the parenthesis with a double-backslash:

gsub("log\\(", "", string)

(Obligatory: http://xkcd.com/234/)

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
28

Ben's answer gives you the good generally applicable way of doing this.

Alternatively, in your situation you could use the fixed=TRUE argument, like this:

gsub("log(", "", string, fixed=TRUE)
# [1] "M)"

It is appropriate whenever the pattern argument to gsub() is a character string containing the literal sequence of characters you are searching for. Then, it's nice because it allows you to type the exact pattern that you are searching for, without escapes etc.

Josh O'Brien
  • 159,210
  • 26
  • 366
  • 455
  • 3
    @BenBolker -- Odd. It should be `fixed=TRUE`, but `fixed="TRUE"`, `fixed="true"`, and `fixed="True"` all work as well. Other combos like `fixed="TRue"` or `fixed="TruE"` don't. Relatedly (?) I recently realized that `rnorm("9")` (for instance) works, when I'd say it shouldn't. (FWIW, `rnorm("9", "1")` *doesn't* work.) I wonder if these cases have something to do with the way the underlying C-code processes arguments that are passed on to it. What do you think? – Josh O'Brien Oct 13 '14 at 17:55
  • 1
    Interesting. The answer is in `?as.logical`: `‘as.logical’ attempts to coerce its argument to be of logical type. For ‘factor’s, this uses the ‘levels’ (labels). Like ‘as.vector’ it strips attributes including names. Character strings ‘c("T", "TRUE", "True", "true")’ are regarded as true, ‘c("F", "FALSE", "False", "false")’ as false, and all others as ‘NA’.` – Ben Bolker Oct 13 '14 at 18:04
  • @BenBolker Nice find. I wonder if there's a somehow related/analagous conversion going on in the `rnorm("9")` case, and whether it only happens for that first argument because it's an argument with no default value. Just checked `seq_len("9")` (as the first `.Primitive` function that came to mind), and it shows similar behavior... – Josh O'Brien Oct 13 '14 at 18:11
1

If you are not a regex specialist (many of us are not!), I find it more straight forward to separate the removal of the unneeded text and the parens, provided your query supports that.

The question seems to indicate only wanting to remove parens, so you could use:

gsub(paste(c("[(]", "[)]"), collapse = "|"), "", string)

This results in the string without parens: "logM"

If you also want to remoeve the "M"

gsub(paste(c("M", "[(]", "[)]"), collapse = "|"), "", string)

This results in the result "log"

glenn_in_boston
  • 881
  • 7
  • 13