2

Is there a canonical ordering of submatch expressions in a regular expression?

For example: What is the order of the submatches in
"(([0-9]{3}).([0-9]{3}).([0-9]{3}).([0-9]{3}))\s+([A-Z]+)" ?

a. (([0-9]{3})\.([0-9]{3})\.([0-9]{3})\.([0-9]{3}))\s+([A-Z]+)  
   (([0-9]{3})\.([0-9]{3})\.([0-9]{3})\.([0-9]{3}))  
   ([A-Z]+)  
   ([0-9]{3})  
   ([0-9]{3})  
   ([0-9]{3})  
   ([0-9]{3})  

b. (([0-9]{3})\.([0-9]{3})\.([0-9]{3})\.([0-9]{3}))\s+([A-Z]+)  
   (([0-9]{3})\.([0-9]{3})\.([0-9]{3})\.([0-9]{3}))  
   ([0-9]{3})  
   ([0-9]{3})  
   ([0-9]{3})  
   ([0-9]{3})  
   ([A-Z]+)  

or

c. somthin' else. 
Jon Seigel
  • 12,251
  • 8
  • 58
  • 92
user16753
  • 21
  • 1

3 Answers3

4

They tend to be numbered in the order the capturing parens start, left to right. Therefore, option b.

jjrv
  • 4,211
  • 2
  • 40
  • 54
2

In Perl 5 regular expressions, answer b is correct. Submatch groupings are stored in order of open-parentheses.

Many other regular expression engines take their cues from Perl, but you would have to look up individual implementations to be sure. I'd suggest the book Mastering Regular Expressions for a deeper understanding.

brian d foy
  • 129,424
  • 31
  • 207
  • 592
Adrian Dunston
  • 2,950
  • 4
  • 24
  • 23
0

You count opening parentheses, left to right. So the order would be

(([0-9]{3}).([0-9]{3}).([0-9]{3}).([0-9]{3}))
([0-9]{3})
([0-9]{3})
([0-9]{3})
([0-9]{3})
([A-Z]+)

At least this is what Perl would do. Other regex engines might have different rules.

Asgeir S. Nilsen
  • 1,137
  • 9
  • 13