12

I've searched many times, and haven't found the answer here or elsewhere. I want to replace each space ' ' in variables containing file names with a '\ '. (A use case could be for shell commands, with the spaces escaped, so each file name doesn't appear as a list of arguments.) I have looked through the StackOverflow question "how to replace single backslash in R", and find that many combinations do work as advertised:

> gsub(" ", "\\\\", "a b")
[1] "a\\b"

> gsub(" ", "\\ ", "a b", fixed = TRUE)
[1] "a\\ b"

but try these with a single-slash version, and R ignores it:

> gsub(" ", "\\ ", "a b")
[1] "a b"

> gsub(" ", "\ ", "a b", fixed = TRUE)
[1] "a b"

For the case going in the opposite direction — removing slashes from a string, it works for two:

> gsub("\\\\", " ", "a\\b")
[1] "a b"

> gsub("\\", " ", "a\\b", fixed = TRUE)
[1] "a b"

However, for single slashes some inner perversity in R prevents me from even attempting to remove them:

> gsub("\\", " ", "a\\b")
Error in gsub("\\", " ", "a\\b") : 
  invalid regular expression '\', reason 'Trailing backslash'

> gsub("\", " ", "a\b", fixed = TRUE)
Error: unexpected string constant in "gsub("\", " ", ""

The 'invalid regular expression' is telling us something, but I don't see what. (Note too that the perl = True option does not help.)

Even with three back slashes R fails to notice even one:

> gsub(" ", "\\\ ", "a b")
[1] "a b"

The patter extends too! Even multiples of two work:

> gsub(" ", "\\\\\\\\", "a b")
[1] "a\\\\b"

but not odd multiples (should get '\\\ ':

> gsub(" ", "\\\\\\ ", "a b")
[1] "a\\ b"

> gsub(" ", "\\\ ", "a b", fixed = TRUE)
[1] "a\\ b"

(I would expect 3 slashes, not two.)

My two questions are:

  • How can my goal of replacing a ' ' with a '\ ' be accomplished?
  • Why did the odd number-slash variants of the replacements fail, while the even number-slash replacements worked?

For shell commands a simple work-around is to quote the file names, but part of my interest is just wanting to understand what is going on with R's regex engine.

Community
  • 1
  • 1
user3897315
  • 487
  • 2
  • 5
  • 14
  • 1
    Try `cat()` ing some of your results - `\b` is a backspace character, like `\n` is a line-break and `\t` is a tab character. E.g. - `cat("remove_one_letter_from_the_end\b")` - Two slashes "\\" is actually a single slash - the R console just adds an extra one to escape it. `cat("\\", sep="\n")` – thelatemail Aug 12 '16 at 03:16
  • @thelatemail You posted your comment when I was halfway writing an answer. – Tim Biegeleisen Aug 12 '16 at 03:19
  • @TimBiegeleisen - all good - I think you've covered the main issues! – thelatemail Aug 12 '16 at 03:20
  • A good suggestion @thelatemail. Clearly some, if not all of the explanation is in what `print()` is doing. However, if I try `cat(gsub(" ", "\\ ", "a z"))`, the result is `"a z"` with the newline removed — but notice that the z is not escaped, and the space between the letters remains ... – user3897315 Aug 12 '16 at 04:05
  • @user3897315 - it gets more complex inside `gsub` - because a slash can be used to specify things like `\s` for a space in regex, it needs to have extra escaping, so R needs `\\s` to represent a space. Which means you actually need four slashes "\\\\" to represent a slash because otherwise it will mean nothing - `cat(gsub(" ", "\\\\ ", "a z"))` – thelatemail Aug 12 '16 at 04:09

2 Answers2

22

Get ready for a face-palm, because this:

> gsub(" ", "\\\ ", "a b", fixed = TRUE)
[1] "a\\ b"

is actually working.

The two backslashes you see are just the R console's way of displaying a single backslash, which is escaped when printed to the screen.

To confirm the replacement with a single backslash is indeed working, try writing the output to a text file and inspect yourself:

f <- file("C:\\output.txt")
writeLines(gsub(" ", "\\", "a b", fixed = TRUE), f)
close(f)

In output.txt you should see the following:

a\b
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • 1
    I don't think you need the third slash - `gsub(" ", "\\ ", "a b", fixed = TRUE)` should be sufficient – thelatemail Aug 12 '16 at 03:20
  • 1
    Thank you. Yes to the face-palm; it makes sense, now that I see R's print statement is including those 'helpful' escaping backslashes. I don't yet see why it is helpful to have the extra backslash in the output of the print — neither Perl nor python include it in their displayed prints — but with that known, I am able to proceed. – user3897315 Aug 12 '16 at 17:19
  • 1
    @user51462 `fixed = TRUE` tells R to not use a regex engine when replacing. Instead, it will use the literal expression you pass it. – Tim Biegeleisen Jun 23 '20 at 11:51
  • Um...it _does_ only apply to the first argument. – Tim Biegeleisen Jun 23 '20 at 12:55
3

Very helpful discussion! (I've been Googling the heck out of this for 2 days.)

Another way to see the difference (rather than writing to a file) is to compare the contents of the string using print and cat.

z <- gsub(" ", "\\", "a b", fixed = TRUE)

> print(z)
[1] "a\\ b"

> cat(z)
a\ b

So, by using cat instead of print we can confirm that the gsub line is doing what was intended when we're trying to add single backslashes to a string.

D. Woods
  • 3,004
  • 3
  • 29
  • 37
  • And maybe/ideally future versions of `base::shQuote` could simplify/automate this process for all of us? – D. Woods Dec 06 '18 at 16:14