3

does anybody knows why I am getting different results depending on the order of the patterns?

list1 = ["AA1", "AA2","AA", "AA+"]
list2 = ["AA1", "AA2","AA+", "AA"]
results1 = "somethin with AA+ in it".scan(Regexp.union(list1))
results2 = "somethin with AA+ in it".scan(Regexp.union(list2))

Results1 outputs "AA" Results2 outputs "AA+"

I may be misunderstandig how scan works, but I was expecting it to return every occurrence, hence both "AA" and "AA+". Also I don't get why the ouptut changes depending on the order of the strings used.

Jack
  • 497
  • 5
  • 16
  • Unanchored alternation group matches the first branch, once a branch matches the others are not tested. – Wiktor Stribiżew Jul 04 '16 at 12:37
  • 2
    I don't think `Regexp.union()` is doing what you think it is. It creates a single regular expression that matches any of the provided expressions. It does not loop over the list and run one regular expression match at a time. – Phylogenesis Jul 04 '16 at 12:43

1 Answers1

8

In an alternation group in NFA regex, the first left-most branch "wins". See Alternation with The Vertical Bar or Pipe Symbol for a more detailed explanation.

The regexes you have are

Regex 1: (?-mix:AA1|AA2|AA|AA\+)
Regex 2: (?-mix:AA1|AA2|AA\+|AA)

If you use the first regex, you get AA because |AA| branch matches first, and the others are not tested against the input, the match is returned and the regex index advances.

The second regex yields AA+ because the |AA\+| matches first, and the match is returned, |AA is not even tested.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Some related posts: [*Alternation usage creates strange behavior*](http://stackoverflow.com/questions/35987637/alternation-usage-creates-strange-behavior/35987686#35987686) and [*Why regex engine choose to match pattern `..X` from `.X|..X|X.`?*](http://stackoverflow.com/questions/35946342/why-regex-engine-choose-to-match-pattern-x-from-x-xx/35950170#35950170) – Wiktor Stribiżew Jul 04 '16 at 12:49
  • Downvoting for a reason is a way to enrich everyone's knowledge. What is the reason for a downvote here? – Wiktor Stribiżew Sep 16 '16 at 22:28