11
s = "#main= 'quotes'
s.gsub "'", "\\'" # => "#main= quotes'quotes"

This seems to be wrong, I expect to get "#main= \\'quotes\\'"

when I don't use escape char, then it works as expected.

s.gsub "'", "*" # => "#main= *quotes*"

So there must be something to do with escaping.

Using ruby 1.9.2p290

I need to replace single quotes with back-slash and a quote.

Even more inconsistencies:

"\\'".length # => 2
"\\*".length # => 2

# As expected
"'".gsub("'", "\\*").length # => 2
"'a'".gsub("'", "\\*") # => "\\*a\\*" (length==5)

# WTF next:
"'".gsub("'", "\\'").length # => 0

# Doubling the content?
"'a'".gsub("'", "\\'") # => "a'a" (length==3)

What is going on here?

Andrew Grimm
  • 78,473
  • 57
  • 200
  • 338
Dmytrii Nagirniak
  • 23,696
  • 13
  • 75
  • 130
  • 1
    Similar questions: http://stackoverflow.com/questions/6499443/escaping-apostrophes-using-gsub and http://stackoverflow.com/questions/2180322/ruby-gsub-doesnt-escape-single-quotes . I had to combine the [ruby] [gsub] tags, then look at the "FAQ" tag. – Andrew Grimm Aug 16 '11 at 07:47
  • 1
    @Andrew: Thanks for the librarian work. – mu is too short Aug 16 '11 at 08:04

3 Answers3

21

You're getting tripped up by the specialness of \' inside a regular expression replacement string:

\0, \1, \2, ... \9, \&, \`, \', \+
Substitutes the value matched by the nth grouped subexpression, or by the entire match, pre- or postmatch, or the highest group.

So when you say "\\'", the double \\ becomes just a single backslash and the result is \' but that means "The string to the right of the last successful match." If you want to replace single quotes with escaped single quotes, you need to escape more to get past the specialness of \':

s.gsub("'", "\\\\'")

Or avoid the toothpicks and use the block form:

s.gsub("'") { |m| '\\' + m }

You would run into similar issues if you were trying to escape backticks, a plus sign, or even a single digit.

The overall lesson here is to prefer the block form of gsub for anything but the most trivial of substitutions.

mu is too short
  • 426,620
  • 70
  • 833
  • 800
  • Thanks a lot. It definitely clears the things. I just wonder what the substitutions `\&, \`, \', \+` mean? I can't remember I've see them used anywhere. – Dmytrii Nagirniak Aug 16 '11 at 07:26
  • @Dmytrii: Looking at the global variable versions of those might help (http://www.zenspider.com/Languages/Ruby/QuickRef.html#19). I don't know of a good authoritative online reference for Ruby's regexes. – mu is too short Aug 16 '11 at 07:37
3
s = "#main = 'quotes'

s.gsub "'", "\\\\'"

Since \it's \\equivalent if you want to get a double backslash you have to put four of ones.

Kleber S.
  • 8,110
  • 6
  • 43
  • 69
  • I don't want double backslash - only one. That's why I escape one slash and add quote. The `"\\'".length == 2` which is correct. `gsub` gets it wrong by doubling the content. – Dmytrii Nagirniak Aug 16 '11 at 06:45
2

You need to escape the \ as well:

s.gsub "'", "\\\\'"

Outputs

"#main= \\'quotes\\'"

A good explanation found on an outside forum:

The key point to understand IMHO is that a backslash is special in replacement strings. So, whenever one wants to have a literal backslash in a replacement string one needs to escape it and hence have [two] backslashes. Coincidentally a backslash is also special in a string (even in a single quoted string). So you need two levels of escaping, makes 2 * 2 = 4 backslashes on the screen for one literal replacement backslash.

source

brentvatne
  • 7,603
  • 4
  • 38
  • 55
  • 1
    But it is already escaped - the string `"\\"` should return a single slash. So then `"\\'"` returns string with length equal to 2: first is the slash, second is the quote. It is messed up in gsub because it doubles the content (why?). – Dmytrii Nagirniak Aug 16 '11 at 06:42
  • Additionally `.gsub("'", "\\*")` works as expected, which is even more inconsistent. – Dmytrii Nagirniak Aug 16 '11 at 06:47
  • The problem is that [`\'` has a special meaning](http://www.ruby-doc.org/docs/ProgrammingRuby/html/language.html#UM) in a regex substitution, this somewhat obscure feature is around mostly for historic reasons. – mu is too short Aug 17 '11 at 02:19