1

Using Ruby 2.4. I want to create a regular expression by saying match an arbitrary number of spaces followed by a letter that occurs in my array. So I tried this

LETTERS = ["a", "b"]
# => ["a", "b"]
data = ["asdf f", "sdfsdf x"]
# => ["asdf f", "sdfsdf x"]
data.grep(/(^|[[:space:]]+)[#{Regexp.union(LETTERS)}]$/i)
# => ["asdf f", "sdfsdf x"]

but as you can see, despite the fact that neither token ends in a letter in my array, both tokens are getting matched. How do I rewrite my regexp to account for this?

Yu Hao
  • 119,891
  • 44
  • 235
  • 294
Dave
  • 15,639
  • 133
  • 442
  • 830

2 Answers2

2

Solution

Subtle bugs will appear if you're not very careful with Regexen and interpolation.

You need :

/[[:space:]]+(?:#{Regexp.union(LETTERS).source})$/i

Here's an example :

LETTERS = %w(a b).freeze
data = ['asdf f', 'sdfsdf x', 'test A', 'test a', 'testB', 'testb']
r = /[[:space:]]+(?:#{Regexp.union(LETTERS).source})$/i
# /[[:space:]]+(?:a|b)$/i
data.grep(r)
# ["test A", "test a"]

Bug 1

If you omit Regexp#source :

r2 = /[[:space:]]+(?:#{Regexp.union(LETTERS)})$/i
# /[[:space:]]+(?:(?-mix:a|b))$/i
data.grep(r2)
# ["test a"]

Note that Regexp.union is case sensitive. When it's imported into the larger regex, its flags are also imported : (a|b) is case sensitive, so it doesn't match "test A". Here's a related thread : Interpolating regexes into another regex

Bug 2

If you omit the parens around a|b :

r3 = /[[:space:]]+#{Regexp.union(LETTERS).source}$/i
# /[[:space:]]+a|b$/i
data.grep(r3)
# ["test A", "test a", "testB", "testb"]

spaces will only be considered before a. "testB" will match even though it shouldn't.

Community
  • 1
  • 1
Eric Duminil
  • 52,989
  • 9
  • 71
  • 124
1

Solution

Regexp.new("[[:space:]]+(#{Regexp.union(LETTERS).source})", Regexp::IGNORECASE)

You could use this regex:

LETTERS = ["a","b"]
#=> ["a","b"]
regex = Regexp.new("[[:space:]]+#{Regexp.union(LETTERS)}", Regexp::IGNORECASE)
#=> /[[:space:]]+(?-mix:a|b)/i
data = ["asdf f", "sdfsdf x"]
#=> ["asdf f", "sdfsdf x"]
data.grep(regex)
#=> []
data = ["asdf f", "sdfsdf a"]
#=> ["asdf f", "sdfsdf a"]
data.grep(regex)
#=> ["sdfsdf a"]

But the innermost regular expression will not ignore case. Thanks to the @EricDuminil's solution its easy to see the mistake.

David Lilue
  • 597
  • 2
  • 14
  • Thanks but this is not quite the same as what I had. How do I replicate the "/i" (case insenstivie match)? Also "\s+" is not the same thing as "[[:space:]]+" – Dave Mar 26 '17 at 03:01
  • @Dave `\s+` is similar to `[[:space:]]+` but you're right, they're not the same. I edited to ignore case. – David Lilue Mar 26 '17 at 03:12
  • `Regexp::IGNORECASE` is still ignored by the inner regex. Your regex doesn't match `"test A"` for example. – Eric Duminil Mar 26 '17 at 11:16
  • @EricDuminil @ Dave didn't mention that strings like 'text A' should be accepted but if so, your answer is correct. – David Lilue Mar 26 '17 at 16:04
  • @DavidLilue: `/i` cannot apply to `[[:space:]]+`, so it must obviously apply to the `LETTERS` union. It means it is expected that `"test A"` matches, even though it's not explicitely mentioned in the question. BTW, It's perfectly fine to write links to other answers, it is not okay to simply copy-paste them, though. – Eric Duminil Mar 26 '17 at 17:45
  • @EricDuminil Sorry about that, everyone could copy-paste code from anywhere, the important is to say where it came. My intention was to make it clear that your answer is the solution and mine have some errors that you help me to understand. Is not the idea of a discussion? – David Lilue Mar 26 '17 at 18:00
  • Discussion is perfectly fine, that's what the comments or chat are for. Thanks for removing the copy-paste. Your answer is now still buggy but accepted. Oh well... :) – Eric Duminil Mar 26 '17 at 18:05
  • @EricDuminil i can't leave a accepted solution with bugs. It's fixed. – David Lilue Mar 26 '17 at 18:21
  • I understand, you didn't pick your answer. Feel free to put the correct code at the beginning, SO users might use the first line of code they see. – Eric Duminil Mar 26 '17 at 18:24
  • Please don't use "edit" or "update" tags in questions or answers. "[Should 'Edit:' in edits be discouraged?](https://meta.stackoverflow.com/a/255685/128421)". We can see what changed and when if we need to know. – the Tin Man Mar 27 '17 at 20:47