25

I would like to use gsub to replace every occurrence of a backslash in a string with 2 backslashes.

Currently, what I have I tried is gsub("\\\\", "\\", x). This doesn't seem to work though. However, if I change the expression to instead replace each backslash with "a", it works fine.

> gsub("\\\\", "\\", "\\")
[1] ""
> gsub("\\\\", "a", "\\")
[1] "a"
> gsub("\\\\", "\\\\", "\\")
[1] "\\"

The last character is only a single backslash; R just prints 2 because it prints escaped characters with the backslash. Using nchar confirms that the length is 1.

What causes this functionality? The second argument to gsub isn't a regular expression, so having 4 backslashes in the string literal should be converted to a character with 2 backslashes. It makes even less sense that the first gsub call above returns an empty string.

Jon Claus
  • 2,862
  • 4
  • 22
  • 33
  • 1
    I think you have your parameter order confused. The first parameter is what you're looking for. So if you want to replace every slash with two, don't you want: `gsub("\\\\", "\\\\\\\\", x)`? Also, if you `cat()` the value rather than using the default implicit `print()`, you don't see the escaped part of the slash. – MrFlick Dec 15 '14 at 19:54
  • See also [here](http://stackoverflow.com/questions/11806501/backslash-in-r-string) – David Arenburg Dec 15 '14 at 20:18

1 Answers1

50

Here's what you need:

gsub("\\\\", "\\\\\\\\", "\\")
[1] "\\\\"

The reason that you need four backslashes to represent one literal backslash is that "\" is an escape character in both R strings and for the regex engine to which you're ultimately passing your patterns. If you were talking directly to the regex engine, you'd use "\\" to indicate a literal backslash. But in order to get R to pass "\\" on to the regex engine, you need to type "\\\\".


(If you are just wanting to double backslashes, you might want to use this instead):

gsub("\\", "\\\\", "\\", fixed=TRUE)
[1] "\\\\"
Josh O'Brien
  • 159,210
  • 26
  • 366
  • 455
  • Yes, I found that by trial and error. I was wondering why the above behavior happened. The second argument isn't a regex, so the result should only be the same length as the second argument. In that example, the second argument is 4 characters long but the result is only 2. – Jon Claus Dec 15 '14 at 20:00
  • 1
    The second argument may not be a regex, but the `"\"` backslash is still interepreted as an escape character within it. It's used as such in the second argument of a calls like this: `gsub("([[:digit:]]*)\\s*([[:alpha:]]*)", "\\2---->\\1", "101 trombones")` – Josh O'Brien Dec 15 '14 at 20:11
  • 1
    I'm surprised it doesn't complain in my first example, as there is an escape character and nothing else in the second argument. I know that doing the same thing for a regex, e.g. in the first argument, will throw an error over the trailing backslash. – Jon Claus Dec 15 '14 at 20:36
  • @JonClaus Agreed. That's the bit that's a mystery to me. – Josh O'Brien Dec 15 '14 at 20:44