2

What is the opposite of Regexp.escape ?

> Regexp.escape('A & B')
=> "A\\ &\\ B"
> # do something, to get the next result: (something like Regexp.unescape(A\\ &\\ B))
=> "A & B"

How can I get the original value?

user2503775
  • 4,267
  • 1
  • 23
  • 41

4 Answers4

3
replaces = Hash.new { |hash,key| key } # simple trick to return key if there is no value in hash
replaces['t'] = "\t"
replaces['n'] = "\n"
replaces['r'] = "\r"
replaces['f'] = "\f"
replaces['v'] = "\v"

rx = Regexp.escape('A & B')
str = rx.gsub(/\\(.)/){ replaces[$1] }

Also make sure to #puts output in irb, because #inspect escapes characters by default.

Basically escaping/quoting looks for meta-characters, and prepends \ character (which has to be escaped for string interpretation in source code). But if we find any control character from list: \t, \n, \r, \f, \v, then quoting outputs \ character followed by this special character translated to ascii.

UPDATE:

My solution had problems with special characters (\n, \t ans so on), I updated it after investigating source code for rb_reg_quote method.

UPDATE 2:

replaces is hash, which converts escaped characters (thats why it is used in block attached to gsub) to unescaped ones. It is indexed by character without escape character (second character in sequence) and searches for unescaped value. The only defined values are control-characters, but there is also default_proc attached (block attached to Hash.new), which returns key if there is no value found in hash. So it works like this:

  1. for "n" it returns "\n", the same for all other escaped control characters, because it is value associated with key
  2. for "(" it returns "(", because there is no value associated with "(" key, hash calls #default_proc, which returns key itself

The only characters escaped by Regexp.escape are meta characters and control characters, so we don't have to worry about alphanumerics.

Take a look at http://ruby-doc.org/core-2.0.0/Hash.html#method-i-default_proc for documentation on #defoult_proc

MBO
  • 30,379
  • 5
  • 50
  • 52
  • Thanks, But I'm getting double Slash for this: `Regexp.escape('H\B')` – user2503775 Sep 30 '13 at 14:22
  • `puts 'H\B'.inspect` (irb does it to return values) outputs double slash too. Try `puts 'H\B'` and then puts result of my solution's code – MBO Sep 30 '13 at 14:32
  • Can you please explain what `replaces[$1]` does? – user2503775 Oct 01 '13 at 06:21
  • @user2503775 See my Update 2 for explanation. Hope it helps understanding this trick – MBO Oct 01 '13 at 08:47
  • Thanks. About a single slash - I guess there is no solution. because `Regexp.escape('H\B')` gives me `"H\\\\B"` so 'I'm getting double Slash for `"H\\\\B".gsub(/\\(.)/){ replaces[$1] }`.. I got `H\\B` for every way that was suggested here. Or maybe you have an idea how to do this? – user2503775 Oct 01 '13 at 09:23
  • @user2503775 Please check http://codepad.org/Ehu1GJQ1. Are you checking results in irb? – MBO Oct 01 '13 at 09:41
  • I checked it in rails console and also in irb. And got `=> "H\\B"`: `irb(main):008:0> Regexp.escape('H\B').gsub(/\\(.)/){ replaces[$1] }` `=> "H\\B"` – user2503775 Oct 01 '13 at 10:20
  • @user2503775 You see, double backslash is default when you inspect results in irb (and rails console). Thats why I mentioned you should check results with `puts`. If you output only `"H\B"` in irb/rails console you get the same double backslash - it's not because it is there, but because result string is inspected, and returned with `"` around and backslashes doubled. – MBO Oct 01 '13 at 10:35
  • Great! @MBO thank you very much! In the first time you wrote it, I didn't understand you correctly.. – user2503775 Oct 01 '13 at 10:56
1

using a regex replace using \\(?=([\\\*\+\?\|\{\[\(\)\^\$\.\#\ ]))\

should give you the string unescaped, you would only have to replace \r\n sequences with there CrLf counterparts.

"There\ is\ a\ \?\ after\ the\ \(white\)\ car\.\ \r\n\ it\ should\ be\ http://car\.com\?\r\n"

is unescaped to :

"There is a ? after the (white) car. \r\n it should be http://car.com?\r\n"

and removing the \r\n gives you :

There is a ? after the (white) car. 
 it should be http://car.com?
Sedecimdies
  • 152
  • 1
  • 10
1

You can perhaps use something like this?

def unescape(s)
  eval %Q{"#{s}"}
end

puts unescape('A\\ &\\ B')

Credits to this question.

codepad demo

If you are okay with a regex solution, you can use this:

res = s.gsub(/\\(?!\\)|(\\)\\/, "\\1")

codepad demo

Community
  • 1
  • 1
Jerry
  • 70,495
  • 13
  • 100
  • 144
  • 1
    I always feel unsave when using eval. You can insert evil code when using it: `unescape('"; puts 42#"')` (prints 42, but could possibly execute a script that deletes your SO account) – tessi Sep 30 '13 at 10:18
  • @tessi Mhm, I can understand that. I had tried the answers which suggested alternatives but they either used `eval` as well or didn't work. The YAML module seemed to be working for the OP there, but somehow it isn't on codepad; might be the version... – Jerry Sep 30 '13 at 10:24
  • Thanks, but I prefer not to use `eval` ... Hoping to find another choice. – user2503775 Sep 30 '13 at 13:12
  • @user2503775 I added a `.gsub` method. – Jerry Sep 30 '13 at 14:39
  • If you don't already know without any doubt whatsoever that `s` is an escaped regex created entirely under your control, `eval` seems horribly unsafe. Try `unescape('";print "hi!')`. – cHao Sep 30 '13 at 14:40
  • @MBO You're right, and I don't know enough Ruby to put a function in the replace, while I believe yours work correctly :) – Jerry Sep 30 '13 at 15:25
  • @cHao I guess, yes :( [tessi](http://stackoverflow.com/users/1881769/tessi) already mentioned it in an earlier comment, if you can see it. – Jerry Sep 30 '13 at 15:27
  • @Jerry: Yeah...i see it now. The example wasn't there when i was posting. Oh well. :) – cHao Sep 30 '13 at 15:32
1

Try this

>> r = Regexp.escape("A & B (and * c [ e] + )")
# => "A\\ &\\ B\\ \\(and\\ \\*\\ c\\ \\[\\ e\\]\\ \\+\\ \\)"
>> r.gsub("\\(","(").gsub("\\)",")").gsub("\\[","[").gsub("\\]","]").gsub("\\{","{").gsub("\\}","}").gsub("\\.",".").gsub("\\?","?").gsub("\\+","+").gsub("\\*","*").gsub("\\ "," ")
# => "A & B (and * c [ e] + )"

Basically, these (, ), [, ], {, }, ., ?, +, * are the meta characters in regex. And also \ which is used as an escape character.

The chain of gsub() calls replace the escaped patterns with corresponding actual value.

I am sure there is a way to DRY this up.

Update: DRY version as suggested by user2503775

>> r.gsub("\\","")

Update:

following are the special characters in regex

    [,],{,},(,),|,-,*,.,\\,?,+,^,$,<space>,#,\t,\f,\v,\n,\r
Litmus
  • 10,558
  • 6
  • 29
  • 44
  • Would that be true for all types of special characters? – user2503775 Sep 30 '13 at 12:21
  • I cannot say about multi-byte characters. Otherwise, escaping is required only for characters that have special meaning for regex engine. And the characters mentioned above are the only ones I am aware of. – Litmus Sep 30 '13 at 12:26
  • 1
    `r.gsub("\\","")` gives me the same result.. Why do I need the entire line? – user2503775 Sep 30 '13 at 13:06
  • 2
    Also... What if I have the character \ in my string? – user2503775 Sep 30 '13 at 13:09
  • Yes. you are right `r.gsub("\\","")` works. That is super DRY. And I guess you can't escape a "\" in a string. `Regexp.escape("\")` returns nothing. – Litmus Sep 30 '13 at 14:28
  • @Eternal-Learner: `Regexp.escape("\")` would be invalid syntax; the backslash escapes the would-be closing quote, so you never end the string. Try `Regexp.escape("\\")` instead. – cHao Sep 30 '13 at 15:35
  • @Eternal-Learner There is also problem with your gsubs: `"\("` (and other versions too) is just `"("` in Ruby, you need to double backslashes there. – MBO Sep 30 '13 at 15:44