Best way to escape and unescape strings in Ruby?

Question

Does Ruby have any built-in method for escaping and unescaping strings? In the past, I've used regular expressions; however, it occurs to me that Ruby probably does such conversions internally all the time. Perhaps this functionality is exposed somewhere.

So far I've come up with these functions. They work, but they seem a bit hacky:

def escape(s)
  s.inspect[1..-2]
end

def unescape(s)
  eval %Q{"#{s}"}
end

Is there a better way?

Escape for what purpose? For use in Ruby source? – mu is too short Dec 26 '11 at 22:50 — mu is too short, Dec 26 '11 at 22:50
@mu is too ahort: Yes, escaping as per Ruby source rules. – jwfearn Dec 26 '11 at 23:25 — jwfearn, Dec 26 '11 at 23:25

score 25 · Accepted Answer · answered Nov 16 '18 at 18:10

Ruby 2.5 added String#undump as a complement to String#dump:

$ irb
irb(main):001:0> dumped_newline = "\n".dump
=> "\"\\n\""
irb(main):002:0> undumped_newline = dumped_newline.undump
=> "\n"

With it:

def escape(s)
  s.dump[1..-2]
end

def unescape(s)
  "\"#{s}\"".undump
end

$irb
irb(main):001:0> escape("\n \" \\")
=> "\\n \\\" \\\\"
irb(main):002:0> unescape("\\n \\\" \\\\")
=> "\n \" \\"

Stanislav O. Pogrebnyak · Answer 2 · 2011-12-27T15:41:57.490

19

There are a bunch of escaping methods, some of them:

# Regexp escapings
>> Regexp.escape('\*?{}.')   
=> \\\*\?\{\}\. 
>> URI.escape("test=100%")
=> "test=100%25"
>> CGI.escape("test=100%")
=> "test%3D100%25"

So, its really depends on the issue you need to solve. But I would avoid using inspect for escaping.

Update - there is a dump, inspect uses that, and it looks like it is what you need:

>> "\n\t".dump
=> "\"\\n\\t\""

edited Dec 27 '11 at 15:41

answered Dec 27 '11 at 00:03

Stanislav O. Pogrebnyak

523
4
8

6

I'd like to avoid `inspect` too. I was hoping Ruby's own string escaping code might be available. Something along the lines of `Ruby.escape("\t") => "\\t"` and `Ruby.unescape("\\t") => "\t"` – jwfearn Dec 27 '11 at 13:53

score 17 · Answer 3 · answered Feb 28 '14 at 08:43

17

Caleb function was the nearest thing to the reverse of String #inspect I was able to find, however it contained two bugs:

\\ was not handled correctly.
\x.. retained the backslash.

I fixed the above bugs and this is the updated version:

UNESCAPES = {
    'a' => "\x07", 'b' => "\x08", 't' => "\x09",
    'n' => "\x0a", 'v' => "\x0b", 'f' => "\x0c",
    'r' => "\x0d", 'e' => "\x1b", "\\\\" => "\x5c",
    "\"" => "\x22", "'" => "\x27"
}

def unescape(str)
  # Escape all the things
  str.gsub(/\\(?:([#{UNESCAPES.keys.join}])|u([\da-fA-F]{4}))|\\0?x([\da-fA-F]{2})/) {
    if $1
      if $1 == '\\' then '\\' else UNESCAPES[$1] end
    elsif $2 # escape \u0000 unicode
      ["#$2".hex].pack('U*')
    elsif $3 # escape \0xff or \xff
      [$3].pack('H2')
    end
  }
end

# To test it
while true
    line = STDIN.gets
    puts unescape(line)
end

answered Feb 28 '14 at 08:43

antirez

18,314
5
50
44

3

Thanks for the updates! I'd have fixed it if you commented, though. – Caleb Fenton Jun 06 '14 at 00:11
@antirez this is very useful. I've incorporated it into a puppet module I made as a [puppet function](https://github.com/gene1wood/puppet-credstash/blob/4f5879192ab07bd8de07daeb49ab50e9d00ff563/lib/puppet/parser/functions/unescape.rb) – gene_wood Jun 11 '15 at 22:22
@antirez This is the best answer I found so far. Just a tip, instead of using hexadecimal, the actual escaped chars can be used. For example, instead of `"\x0a"`, it can be `"\n"`. I think this is more clear. – rigon Mar 24 '17 at 11:09

b4hand · Answer 4 · 2018-12-13T23:27:53.277

16

Update: I no longer agree with my own answer, but I'd prefer not to delete it since I suspect that others may go down this wrong path, and there's already been a lot of discussion of this answer and it's alternatives, so I think it still contributes to the conversation, but please don't use this answer in real code.

If you don't want to use eval, but are willing to use the YAML module, you can use it instead:

require 'yaml'

def unescape(s)
  YAML.load(%Q(---\n"#{s}"\n))
end

The advantage to YAML over eval is that it is presumably safer. cane disallows all usage of eval. I've seen recommendations to use $SAFE along with eval, but that is not available via JRuby currently.

For what it is worth, Python does have native support for unescaping backslashes.

edited Dec 13 '18 at 23:27

answered Sep 11 '13 at 22:16

b4hand

9,550
4
44
49

3

Thank you. I took your idea and applied it to JSON, `JSON.parse("[#{s}]").first` – akuhn Jan 31 '14 at 03:49
seems YAML code and EVAL code are different. For example s = "\\xD8\\x96a" YAML.load(%Q(---\n"#{s}"\n)) (eval %Q{"#{s}"}) returns different values – MKo Jul 29 '16 at 14:29

the Tin Man · Answer 5 · 2013-08-30T22:16:07.237

13

Ruby's inspect can help:

    "a\nb".inspect
=> "\"a\\nb\""

Normally if we print a string with an embedded line-feed, we'd get:

puts "a\nb"
a
b

If we print the inspected version:

puts "a\nb".inspect
"a\nb"

Assign the inspected version to a variable and you'll have the escaped version of the string.

To undo the escaping, eval the string:

puts eval("a\nb".inspect)
a
b

I don't really like doing it this way. It's more of a curiosity than something I'd do in practice.

edited Aug 30 '13 at 22:16

answered Dec 26 '11 at 23:43

the Tin Man

158,662
42
215
303

6

Danger will Robinson, Danger! Using eval to unescape the string is really dangerous if the string happens to be user input! It would allow the user to effectively run just about anything. – James P McGrath Aug 30 '13 at 10:25
Yes, if it is user input it should be sanitized first. But, it can't run anything, only what the user ID running the code could run, which, in a correctly written application will be reduced privileges or in a chroot sandbox. – the Tin Man Aug 30 '13 at 22:14
You are right. But in reality, much of of the value on a box is not the operating system files, but your data. You chroot your rails app as much as you like, but it still needs access to your database. So whilst your attacker can't do "everything", they can do a lot, including dumping all your data. – James P McGrath Aug 31 '13 at 06:21

Caleb Fenton · Answer 6 · 2014-06-06T00:10:09.807

YAML's ::unescape doesn't seem to escape quote characters, e.g. ' and ". I'm guessing this is by design, but it makes me sad.

You definitely do not want to use eval on arbitrary or client-supplied data.

This is what I use. Handles everything I've seen and doesn't introduce any dependencies.

UNESCAPES = {
    'a' => "\x07", 'b' => "\x08", 't' => "\x09",
    'n' => "\x0a", 'v' => "\x0b", 'f' => "\x0c",
    'r' => "\x0d", 'e' => "\x1b", "\\\\" => "\x5c",
    "\"" => "\x22", "'" => "\x27"
}

def unescape(str)
  # Escape all the things
  str.gsub(/\\(?:([#{UNESCAPES.keys.join}])|u([\da-fA-F]{4}))|\\0?x([\da-fA-F]{2})/) {
    if $1
      if $1 == '\\' then '\\' else UNESCAPES[$1] end
    elsif $2 # escape \u0000 unicode
      ["#$2".hex].pack('U*')
    elsif $3 # escape \0xff or \xff
      [$3].pack('H2')
    end
  }
end

To handle `"\u{12345}"` type encoding for extended unicode characters (such as emoji), I added `|u{([\da-fA-F]+)}` to the regexp like `/\\(?:([#{keys}])|u([\da-fA-F]{4})|u{([\da-fA-F]+)})|\\0?x([\da-fA-F]{2})/`, changed the `$3` references to `$4`, and inserted `elsif $3; ["#$3".hex].pack('U*')` between the $2 and $4 sections. — Grant Neufeld, Jun 05 '20 at 05:59

score 5 · Answer 7 · answered Oct 22 '16 at 14:53

5

I suspect that Shellwords.escape will do what you're looking for

https://ruby-doc.org/stdlib-1.9.3/libdoc/shellwords/rdoc/Shellwords.html#method-c-shellescape

answered Oct 22 '16 at 14:53

MattyB

909
2
9
15

Best way to escape and unescape strings in Ruby?

7 Answers7

Linked

Related