1

I have these variables:

keywords = ["/(?=.*?\bTest1\b).*/i","/(?=.*?\bTest2\b)(?=.*?\bTest3\b).*(?m)^(?!.*?NotThis4)(?m)^(?!.*?NotThis5).*$/i"]

hash = {"Test2 Test3 irrelevant1"=>"Mon, 16 Feb 2015 09:26:02 +0000", "Test2 Test3 NotThis4 irrelevant2"=>"Mon, 16 Feb 2015 09:24:01 +0000", "Test1 irrelevant3 irrelevant4"=>"Mon, 16 Feb 2015 09:23:02 +0000"}

I need to run:

keywords.each do |regex|
  hash.select{ |k,_| k[regex]}
end

I'm trying to collect the hashes with the keys of "Test2 Test3 irrelevant1" and "Test1 irrelevant4 irrelevant5" in this example. The regular expressions are not my concern, though. It is using the regular expression as/in a variable that I cannot get my head around. I tried escaping the \b into \\b, to no avail.

When I set a variable to a regular expression, such as:

regex = "/(?=.*?\bTest2\b)(?=.*?\bTest3\b).*(?m)^(?!.*?NotThis4)(?m)^(?!.*?NotThis5).*$/i"

The code:

hash.select{ |k,_| k[regex]}

does not work.

But if I replace the variable with the actual, literal expression:

hash.select{ |k, _| k[/(?=.*?\bTest2\b)(?=.*?\bTest3\b).*(?m)^(?!.*?NotThis4)(?m)^(?!.*?NotThis5).*$/i]}

it works just fine.

Also, the functionality works just fine with a literal string variable too:

regex = "Test1"
hash.select{ |k, _| k[regex]}

and with the literal string itself:

hash.select{ |k, _| k["Test1"]}

How do I use regular expressions in a variable, with the functionality at the top? Here again, for good measure:

keywords.each do |regex|
  hash.select{ |k,_| k[regex]}
end

The regex is received as a string:

keywords.map! do |array_lineitem|
        builder = ""
        last = ""
        array_lineitem.each do |string_element|
          if string_element[0] == "-"
                string_element.sub!(/^-/, '')
                last += "(?m)^(?!.*?" + string_element + ")"
            else 
                builder += "(?=.*?\b" + string_element + "\b)"  
            end
        end
        if last.empty?
            throwback = "/" + builder + ".*/i"  
        else 
            throwback = "/" + builder + ".*" + last + ".*$" + "/i"
        end
    end 

Converting the string to regexp, I tried the to_regexp gem, the Regexp.escape, Regexp.union and eval(string), but again with no luck. The \b gets converted to \x08 with each of these methods.

Community
  • 1
  • 1
Frank
  • 143
  • 1
  • 9
  • Just so you know, regular expressions are a lot easier to read if you use the [free-spacing mode](http://ruby-doc.org//core-2.1.1/Regexp.html#class-Regexp-label-Free-Spacing+Mode+and+Comments) – ian Feb 18 '15 at 23:08

3 Answers3

0

Why do you assume it's got anything do with \b?

When I set a variable to a regular expression, such as:

   regex = "/(?=.*?\bTest2\b)(?=.*?\bTest3\b).*(?m)^(?!.*?NotThis4)(?m)^(?!.*?NotThis5).*$/i"

the code

hash.select{ |k,_| k[regex]}

You have not set a variable to a regular expression. You have set a variable to a string that begins and ends with / and has the definition of a regex in it, true. To actually set a variable to a regular expression, you don't use double quotes, which define a string, but like this:

>        regex = /(?=.*?\bTest2\b)(?=.*?\bTest3\b).*(?m)^(?!.*?NotThis4)(?m)^(?!.*?NotThis5).*$/i

Now you have set a variable to a regular expression, not a string containing source code for a regular expression.

Based on your description, I think this is likely your problem. If your problem were actually the definition of the regex itself not matching what you want -- which often does happen with complex regexes like that -- the best way to debug is to start with a much simpler regex, confirm it matches what you want, then incrementally build up your complex regex making sure at each step it's still matching what you expect.

You can generate a regex dynamically, with interpolation. Regex // literals support string interpolation with the #{} construct, same as string literals. For instance:

regex = /(?m)^(?!.*?#{string_element})/

In case your string_element has special regex control chars in it, you probably want to use Regex.escape though, if it's meant to represent exactly what's in it as a literal:

regex = /(?m)^(?!.*?#{Regexp.escape string_element})/

If you do have a regular expression definition in a string, you can create a regex out of it:

string = "some?(regex|or)something\Z"
regex  = Regexp.new(string)

puts string.class #=> String
puts regex.class #=> Regexp

I'm not sure if you really want to do that here or not, but you can. I have to admit I don't entirely understand what you're trying to do, and am not confident your approach is the best one for your actual overall goal.

But as far as how to create a regex literal with dynamically interpolated content, and hold it in a variable, it's not a problem, and hopefully this should help.

Community
  • 1
  • 1
jrochkind
  • 22,799
  • 12
  • 59
  • 74
  • Updated the title accordingly and did an edit regarding the regex as a string. – Frank Feb 16 '15 at 17:43
  • okay, do you understand now? `a = "/foo/"` is not setting a variable to a regex. `a = /foo/` is. That's still, I think, your answer. Does that make sense? – jrochkind Feb 16 '15 at 19:29
  • I get the `"/foo/" != /foo/` part now. Thanks. The problem is, though, that I can choose to either be unable to generate the regex dynamically OR choose to be able to dynamically generate the regex in a string but consequently unable convert the string to regex due to problems with `\b` converting to `\x08`. So I'm no less stuck, although understanding more. – Frank Feb 16 '15 at 19:50
  • My remaining problem is that the strings hold a complete regular expression, including starting and ending slashes along with options. But with my understanding of `"regex" != /regex/` and the double-escaping in place, I did manage to focus on and solve this matter in order to convert a complete regex dynamically from string. I will do a write-up. – Frank Feb 17 '15 at 10:40
0

This isn't hard, but it appears you're making it that way:

foo = '\b[ab]'
Regexp.new(foo) # => /\b[ab]/
/#{foo}/ # => /\b[ab]/

or:

foo = "\\b[ab]"
Regexp.new(foo) # => /\b[ab]/
/#{foo}/ # => /\b[ab]/

Ruby is perfectly happy to use a string to create a pattern, you just have to do it right.

Strings are great building blocks for patterns because we can build patterns up from smaller pieces, then finally join the pieces we want into a large pattern. We do that in all sorts of languages too, not just Ruby.

WORD_BOUNDARY = '\b'
WORD_CHARACTERS = '[a-zA-Z]'
WORD_PATTERN = /#{WORD_BOUNDARY}#{WORD_CHARACTERS}+#{WORD_BOUNDARY}/
WORD_PATTERN # => /\b[a-zA-Z]+\b/

/#{WORD_PATTERN}/ # => /(?-mix:\b[a-zA-Z]+\b)/
Regexp.new(WORD_PATTERN) # => /\b[a-zA-Z]+\b/

It's also important to note the difference between "\b" and '\b'. If the string allows interpolation of variables and escaped values, then \b will be treated as a backspace. That's NOT what you want:

"\b" # => "\b"
"\b".ord # => 8

Instead, use a non-interpreted string:

'\b' # => "\\b"

Or double-escape the word-boundary characters.

You can easily dynamically generate a pattern, you just have to follow the rules for string interpolation and understand that escaped characters have to be double-escaped if the string is interpolated.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
  • Double-escaping (`\\b`) got me further along. Thanks. The remaining problem after this was the complete regex, with `/` and ending `/i`, in the string: `foo = "/(?=.*?\\bTest\\b).*/i"` `Regexp.new(foo) # => /\/(?=.*?\bTest\b).*\/i/` But [eval(foo)](http://stackoverflow.com/questions/8652715/convert-a-string-to-regular-expression-ruby) and [Regex.try_convert(foo)](http://ruby-doc.org//core-2.2.0/Regexp.html#method-c-try_convert) both did the trick for, including slashes and options. `eval(foo) # => /(?=.*?\bTest\b).*/i` `Regexp.try_convert(foo) #=> /(?=.*?\bTest\b).*/i` – Frank Feb 17 '15 at 10:48
0

With Tin Man's array of double-escaped strings:

keywords = ["/(?=.*?\\bTest1\\b).*/i","/(?=.*?\\bTest2\\b)(?=.*?\\bTest3\\b).*(?m)^(?!.*?NotThis4)(?m)^(?!.*?NotThis5).*$/i"]

And this hash:

hash = {"Test2 Test3 irrelevant1"=>"Mon, 16 Feb 2015 09:26:02 +0000", "Test2 Test3 NotThis4 irrelevant2"=>"Mon, 16 Feb 2015 09:24:01 +0000", "Test1 irrelevant3 irrelevant4"=>"Mon, 16 Feb 2015 09:23:02 +0000"}

I can use eval(foo) to convert a string version of a complete regex definition into jrochkind's (non-string) regular expression. With the 'to_regexp' gem installed Regexp.try_convert(foo) or Regexp.union(foo)) can also be used.

keywords.map! do |string|
  eval(string) # or Regexp.try_convert(string) with the 'to_regexp' gem
end 

keywords.map do |regex|  
  hash.select{ |k, _| k[regex]}
end

To get the desired result:

# => [{"Test1 irrelevant3 irrelevant4"=>"Mon, 16 Feb 2015 09:23:02 +0000"}, {"Test2 Test3 irrelevant1"=>"Mon, 16 Feb 2015 09:26:02 +0000"}]

My actual code is now updated and structured like this:

keywords = [["Test1"], ["Test2", "Test3", "-NotThis4", "-NotThis5"]]

hash = {"Test2 Test3 irrelevant1"=>"Mon, 16 Feb 2015 09:26:02 +0000", "Test2 Test3 NotThis4 irrelevant2"=>"Mon, 16 Feb 2015 09:24:01 +0000", "Test1 irrelevant3 irrelevant4"=>"Mon, 16 Feb 2015 09:23:02 +0000"}

keywords.map! do |array_lineitem|
        builder = ""
        last = ""
        array_lineitem.each do |string_element|
          if string_element[0] == "-"
                string_element.sub!(/^-/, '')
                last += '(?m)^(?!.*?' + string_element + ')'
            else 
                builder += '(?=.*?\b' + string_element + '\b)'  
            end
        end
        if last.empty?
            throwback = "/" + builder + ".*/i"  
        else 
            throwback = "/" + builder + ".*" + last + ".*$" + "/i"
        end
        eval(throwback) # or Regexp.try_convert(throwback) with the 'to_regexp' gem
    end

# => [/(?=.*?\bTest1\b).*/i, /(?=.*?\bTest2\b)(?=.*?\bTest3\b).*(?m)^(?!.*?NotThis4)(?m)^(?!.*?NotThis5).*$/i]

keywords.map do |regex|  
        hash.select{ |k, _| k[regex]}
    end

# => [{"Test1 irrelevant3 irrelevant4"=>"Mon, 16 Feb 2015 09:23:02 +0000"}, {"Test2 Test3 irrelevant1"=>"Mon, 16 Feb 2015 09:26:02 +0000"}]
Community
  • 1
  • 1
Frank
  • 143
  • 1
  • 9