42

I'm using URI.encode to generate HTML data URLs:

visit "data:text/html,#{URI::encode(html)}"

After upgrading to Ruby 2.7.1, interpreter started warning:

warning: URI.escape is obsolete

Recommended replacements of this are CGI.escape and URI.encode_www_form_component. However, they're not doing same thing:

2.7.1 :007 > URI.escape '<html>this and that</html>'
(irb):7: warning: URI.escape is obsolete
 => "%3Chtml%3Ethis%20and%20that%3C/html%3E"
2.7.1 :008 > CGI.escape '<html>this and that</html>'
 => "%3Chtml%3Ethis+and+that%3C%2Fhtml%3E"
2.7.1 :009 > URI.encode_www_form_component '<html>this and that</html>'
 => "%3Chtml%3Ethis+and+that%3C%2Fhtml%3E"

Result of these slight encoding differences - html page where spaces are replaced by +. My question is - what's a good replacement of URI.encode for this use case?

engineersmnky
  • 25,495
  • 2
  • 36
  • 52
Tadas Sasnauskas
  • 2,183
  • 1
  • 22
  • 24
  • 1
    Take a look at [`ERB::Util.url_encode`](https://ruby-doc.org/stdlib-2.7.1/libdoc/erb/rdoc/ERB/Util.html#url_encode-method) – it encodes space as `%20` and also encodes `/` as `%2F` (which is perfectly fine) – Stefan Dec 23 '20 at 12:53

4 Answers4

50

There is actually a drop in replacement.

s = '<html>this and that</html>'    
p = URI::Parser.new
p.escape(s)
=> "%3Chtml%3Ethis%20and%20that%3C/html%3E"

Docs: https://docs.w3cub.com/ruby~3/uri/rfc2396_parser

Found this through a comment under this article https://docs.knapsackpro.com/2020/uri-escape-is-obsolete-percent-encoding-your-query-string

Also tested this against some other strings in my setup, this also seems to retain commas the same way URI.escape does, in contrast to ERB::Util.url_encode.

NOTE: As this answer became so popular now, it's probably worth to mention that you should not blindly change your code to use URI::Parser unless you are certain your project doesn't need a standards compliant encoder. As URI.escape was actually deprecated for a reason. So before simply switching to URI::Parser make sure you have read and understood https://stackoverflow.com/a/13059657/6376353

Stefan Horning
  • 1,117
  • 13
  • 17
  • 11
    Using `URI::Parser#escape` is _exactly_ the same as using `URI::escape`, except without the warning. And that's fine, if that's what you want, but it's important to realize that this is not a different solution. If you look at the source code, `URI::escape` calls `URI::DEFAULT_PARSER.escape`, and `URI::DEFAULT_PARSER` is an instance of `URI::Parser`. – Steve Oct 05 '21 at 01:53
6

There is no official RFC 3986-compliant URI escaper in the Ruby standard library today.

See Why is URI.escape() marked as obsolete and where is this REGEXP::UNSAFE constant? for background.

There are several methods that have various issues with them as you have discovered and pointed out in the comment:

  • They produce deprecation warnings
  • They do not claim standards compliance
  • They are not escaping in accordance with RFC 3986
  • They are implemented in tangentially related libraries
D. SM
  • 13,584
  • 3
  • 12
  • 21
6

From Apidock.com

require "erb"
include ERB::Util

puts url_encode("Programming Ruby:  The Pragmatic Programmer's Guide")

Generates

Programming%20Ruby%3A%20%20The%20Pragmatic%20Programmer%27s%20Guide

gordie
  • 1,637
  • 3
  • 21
  • 41
2

One alternative, in newer Rubies, is URI.encode_uri_component. But it replaces [], too.

# ruby 2.2, the original URI.encode:
irb(main):007:0> URI.encode "test_query[]=value with spaces"
=> "test_query[]=value%20with%20spaces"
# NOTE: ruby 2.2 doesn't have URI.encode_uri_component

# ruby 3.2.1

# use this to encode a single component (the stuff before and after "=")
irb(main):009:0> URI.encode_uri_component "test_query[]=value with spaces"
=> "test_query%5B%5D%3Dvalue%20with%20spaces"

irb(main):014:0> CGI.escapeURIComponent "test_query[]=value with spaces"
=> "test_query%5B%5D%3Dvalue%20with%20spaces"

# NOTE: different format/ RFC
irb(main):010:0> URI.encode_www_form_component "test_query[]=value with spaces"
=> "test_query%5B%5D%3Dvalue+with+spaces"

# NOTE: different format/ RFC
irb(main):013:0> CGI.escape "test_query[]=value with spaces"
=> "test_query%5B%5D%3Dvalue+with+spaces"

As you can see, none of the above is the direct replacement for URI.encode. But the other ones are correct in specific scenarios.

Replacing [] is correct. Replacing = is not for this use case. So you should do something like ["test_query[]", "value with spaces"].map {|x| URI.escape_uri_component(x) }.join("=") if you want to construct query parameters manually, for example.

I.e. the "component" in the name is the keyword here.

Relevant Ruby issue: https://bugs.ruby-lang.org/issues/17309

Dalibor Filus
  • 1,140
  • 9
  • 19