9

What does the "regular" in the phrase "regular expression" mean?

I have heard that regexes were regular at one time, but no more

barlop
  • 12,887
  • 8
  • 80
  • 109
  • Are there any aspects of regular expressions that are not regular? / designed to not just match a regular language? – barlop Jan 26 '11 at 14:56

4 Answers4

11

The regular in regular expression comes from that it matches a regular language.

The concept of regular expressions used in formal language theory is quite different from what engines like PCRE call regular expressions. PCRE and other similar engines have features like lookahead, conditionals and recursion, which make them able to match non-regular languages.

Community
  • 1
  • 1
Sebastian Paaske Tørholm
  • 49,493
  • 11
  • 100
  • 118
  • So I suppose in the BRE days, it was still regular. No conditionals. ERE then added conditionals.. so they became technically able to match irregular languages. Does that sound right? – barlop Jan 26 '11 at 15:09
  • 1
    @barlop No, POSIX BREs have back-references, and back-references are a non-regular feature. – hobbs Jan 26 '11 at 16:05
  • @hobbs I guess backreferences don't cater for irregular grammars either though 'cos they're not really related to parsing, they're for replacing. how is replacing to do with a grammar? it isn't for following a grammar at all, so it's not like conditionals or lookahead – barlop Jan 26 '11 at 17:26
  • @barlop backreferences have everything to do with matching, and using them moves you from the realm of regular grammars to (at least) context-sensitive grammars – hobbs Jan 26 '11 at 17:56
  • 3
    @barlop: backreferences are used in matching, too--for example, `/(\d)\1\1/` to match three of the same digit. Some tools use a theoretically-correct DFA engine by default, but switch to an NFA engine if the regex contains backreferences. ref: http://www.regular-expressions.info/grep.html – Alan Moore Jan 26 '11 at 18:12
  • @Alan Moore So I suppose IEEE's POSIX standard with its ERE, even BRE, and perhaps its SRE, initiated use of the term regular expression in a way that deviated from its definition in formal language theory. Did GNU's grep use the term in that way first? and was it the first? – barlop Jan 26 '11 at 19:17
  • 1
    @barlop POSIX "standardized" regexes after they were already used in a million places, and after many of the implementations made use of backreferences. Numbered capturing groups existed in V6 Unix `ed` (1975), and numbered backreferences existed in V7 Unix `ed` (1979). Perhaps someone else wants to do a more thorough job of antedating, but my understanding is that ed was the original promulgator of regexes. V6 and V7 Unix `grep` are both documented simply as accepting the same regexes `ed` does. :) – hobbs Jan 26 '11 at 20:35
4

It comes from regular language. This is part of formal language theory. Check out the Chomsky hierarchy for other formal languages.

Michael J. Barber
  • 24,518
  • 9
  • 68
  • 88
1

It's signifying that it's a regular language.

Regexes are still popular. Some people frown on them but they remain a quick and easy (if you know how to use them) way of matching certain types of strings. The alternative is often a good few lines of code looping through strings and extracting the bits that you need which is much nastier!

I still use them on a regular (pun fully intended) basis, to give you a use case I used one the other day to match lines of guitar chords as oppose to lyrics. They're also commonly used for things like basic validation of email addresses and the like.

They're certainly not dead.

Michael Berry
  • 70,193
  • 21
  • 157
  • 216
  • The OP didn't suggest that they were no longer used, just that they were no longer regular, which others have confirmed. – Colin Fine Jan 26 '11 at 14:53
1

I think it comes from the term for the class of grammars that regular expressions describe: regular grammars (or "regular" languages). Where that term comes from is likely answered by a trip to Wikipedia.

Modern regex engines that implement all those fancy look-ahead, pattern re-match, and subexpression counting features, well, those are recognizing a class of grammar that's a superset of regular grammars. "Classical" regular expressions correspond in mechanical ways to theoretical machines called "finite automata". That's a really fun subject in and of itself.

Pointy
  • 405,095
  • 59
  • 585
  • 614