0

I frequently need to convert a String into a Regexp. For many strings, Regexp.new(string) is sufficient. But if string contains special characters, they need to be escaped:

string = "foo(bar)"
regex = Regexp.new(string) # => /foo(bar)/
!!regex.match(string) # => false

The Regexp class has a nice way to escape all characters that are special to regex: Regexp.escape. It's used like so:

string = "foo(bar)"
escaped_string = Regexp.escape(string) # => "foo\\(bar\\)"
regex = Regexp.new(escaped_string) # => /foo\(bar\)/ 
!!regex.match(string) # => true

This really seems like this should be the default way Regexp.new works. Is there a better way to convert a String to a Regexp, besides Regexp.new(Regexp.escape(string))? This is Ruby, after all.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Simon Lepkin
  • 1,021
  • 1
  • 13
  • 25
  • 5
    `Regexp.new` should NOT work that way, because one could not use "special" regex constructs then. Also, I think `include` will do the same job. Check [*How to check whether a string contains a substring in Ruby?*](http://stackoverflow.com/questions/8258517/how-to-check-whether-a-string-contains-a-substring-in-ruby) – Wiktor Stribiżew Apr 20 '16 at 23:13
  • 1
    `String::include?` is the best way to match a `String` against another `String`, but I don't think it can output a `Regexp`. I buy your point about `Regexp.new`, though. – Simon Lepkin Apr 20 '16 at 23:24
  • 1
    The point is, you do not need a `Regexp` at all to check if a literal `String` is present inside another `String`. That is redundant complication/overhead. – Wiktor Stribiżew Apr 20 '16 at 23:27
  • Sure, but that might not be my goal. For example, if I'm using some gem whose API expects a `Regexp` as input, then neither `"foo(bar)"` nor `Regexp.new("foo(bar)")` will do the trick. – Simon Lepkin Apr 20 '16 at 23:32
  • 1
    String's `[]` allows either fixed strings or regexp, as does `gsub` and `sub`. An API that is restrictive when the class itself allows either seems too rigid. – the Tin Man Apr 20 '16 at 23:52
  • How can you use `include?` in place of `"chacha".scan(/cha/) #=> ["cha", "cha"]`? btw, `String::include?` references a class method of `String`. The reference you want (for an instance method) is `String#include?`. – Cary Swoveland Apr 21 '16 at 04:42
  • 1
    Most regular expression libraries I've used take strings that represent regular expressions and if you have special characters in them then it's your responsibility to escape them either manually, or using the `escape` method. You could always patch in your own `Regexp.string` method if you like. – tadman Apr 21 '16 at 06:17
  • @CarySwoveland Good catch, that comment should say `String#include?`. @tadman @theTinMan Yes, I'm starting to see your point. In most cases, converting a String to a Regex in this manner is not necessary. Let me consider if there's a way I can change the answer (or the question) so that this is useful to future visitors here. I'm sure I'm not the only person to think about this. :) – Simon Lepkin Apr 21 '16 at 23:05
  • @tadman @theTinMan Does the new answer satisfy your concerns? If so, I suppose we should throw away my dangerous first answer (about `::union`) – Simon Lepkin Apr 22 '16 at 00:12

1 Answers1

2

You should never need to run Regexp.new(Regexp.escape(string)) This is because, in Core and StdLib, practically every method that takes a Regexp also takes a String (as it should).

In the original case, if you're trying to match a big String big_string on a wacky string with special characters like "foo(bar)", you can just run big_string.match("foo(bar)").

If you're trying to do something fancier, you might need use both ::escape and ::new, but never in direct composition. For example, if we want to match big_string on a wacky string followed by a lone digit, we'll run Regexp.new(Regexp.escape(string) + "\\d").

Simon Lepkin
  • 1,021
  • 1
  • 13
  • 25
  • Be careful here. `"\d"` is a *literal* `d` in string, but `/\d/` is a digit in a regular expression context. You need `"\\d"` in this case. – tadman Apr 22 '16 at 14:54
  • Yep, that's what I get for not testing code before posting it. – Simon Lepkin Apr 23 '16 at 05:59
  • I don't think this is correct. `"foo(bar)".match("foo(bar)") => nil`, but `"foo(bar)".match(Regexp.escape("foo(bar)")) => #` – Jakob Egger Mar 08 '17 at 15:31