2

Please help me understand this ABNF rule ([a-z]* [A-Z]* [0-9]*)*.
I think it could be converted to the regex like this [a-zA-Z0-9]*. So the ABNF rule should match lowercase and/or uppercase letters and/or numbers in any order and their combinations. For example, below strings should be matched with the rule.

"ABC", "abc", "abc12", "aAbC876", "123go", etc.

And if the ABNF rule is ([a-z]* [A-Z]* | [0-9]*)*, it can also be converted to the same regex.

Verifying regex is easy, but is there a tool or something that can verify my understanding about these ABNF rules or can anyone confirm or correct me please?

Community
  • 1
  • 1
canoe
  • 1,273
  • 13
  • 29

2 Answers2

2

Internet specifications often need to define a format syntax. Augmented Backus-Naur Form (ABNF) is a modified version of Backus-Naur Form (often used to describe the syntax of languages used in computing) and has been popular among many of these specifications for balancing compactness and simplicity.

ABNF has a certain set of core rules amongst standard BNF.

Your rule:

([a-z]* [A-Z]* [0-9]*)*

Explanation as an ABNF rule:

(  )        Elements enclosed in parentheses are treated as a 
            single element whose contents are strictly ordered.  
[  ]        Square brackets enclose an optional element sequence
a-z A-Z     Core rule for a ALPHA character
0-9         Core rule for a DIGIT character
*           Repeat (Repetition rule)

Your rule translated into an extended regular expression would simply be the same almost.

([a-z]*[A-Z]*[0-9]*)*

Explanation:

(           group and capture to \1 (0 or more times)
 [a-z]*     any character of: 'a' to 'z' (0 or more times)
 [A-Z]*     any character of: 'A' to 'Z' (0 or more times)
 [0-9]*     any character of: '0' to '9' (0 or more times)
)*          end of \1 

ABNF rules are similar to basic regular expression, both involve naming rules, repetition, alternatives, order-independence, and ranges.

hwnd
  • 69,796
  • 4
  • 95
  • 132
1

The direct translation of the ABNF rule you quote:

([a-z]* [A-Z]* [0-9]*)*

would be an ERE (Extended Regular Expression) like this, which omits the spaces:

([a-z]*[A-Z]*[0-9]*)*

Both mean 'zero or more repeats of: a sequence of zero or more lower case letters, followed by zero or more upper case letters, followed by zero or more digits'.

Because of the nature of the terms, you can simplify that (in this case, but care is required in general) to:

[a-zA-Z0-9]*

Your alternative ABNF rule can also be translated to the same EREs, but again it is only because of the nature of this specific case — the translation is not automatically valid.

I assume that the double quotes and commas in your example output are not parts of the strings that should be matched.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278