-2

I want to know how comma is interpreted in regular expression. Did some research on net and want to validate my understanding as explained in fol snippet:-

{a,b}*  can mean {a+b}* and matches a,aa,b,bb,ab,abbb,aaab,ababab etc
similarly, {a,aba} would match either a or aba

Or is there any other explanation

jaykio77
  • 379
  • 1
  • 7
  • 22

1 Answers1

1

Your question seems to refer to regular expressions in formal language theory. In computer programming we use as a very particular implementation of a regular expression grammar and syntax that evolved to include operations that do not belong to the theoretical concept of regular expressions (cf. ).

Wikipedia notes:

The phrase regular expressions, or regexes, is often used to mean the specific, standard textual syntax for representing patterns for matching text, as distinct from the mathematical notation...

Pure regular expressions just have three operators:

  • Concatenation
  • Alternation
  • Kleene closure

The syntax used for these operators can differ. For example, all of the following would mean the same, but with different syntax rules:

  • (a+b)c*
  • (a|b)c*
  • {a,b}c*

Personally, I think the braces are a bad choice, because they are used to denote the alphabet of the regular language, i.e. the set of symbols on which it operates, and in regex they are used as a quantifier to limit how many times the preceding pattern can repeat.

As to your specific question:

how comma is interpreted

It is the alternation operator. More often + is used for this operator in formal language discussions, while in computer language the symbol | is used for it.

trincot
  • 317,000
  • 35
  • 244
  • 286