Questions tagged [regular-language]

Regular language is a language which can be represented by a regular expression and thus every string in the language can be accepted by the corresponding deterministic finite automaton. Note: Regular Language should not be confused with Regular Expressions. For question regarding pattern matching within strings, use the [regex] tag instead.

Given an alphabet (finite set of symbols) Σ, a language is a set of all sequences of such symbols in that alphabet. A language is a regular language exactly when it can be expressed in terms of a (formal) regular expression and the membership of any string can be decided by a finite-state machine.

Regular languages belong to the highest hierarchy of the Chomsky Hierarchy, and are also called Type-3 grammars. They are above the Type-2 context-free languages which are recognized by pushdown automata, which are above the Type-1 context-sensitive languages recognized by linear bounded automata, and above the Type-0 recursively enumerable languages which can be recognized by Turing Machines. All regular languages are context-free, context-sensitive, and recursively enumerable. Formal regular expressions can be converted to deterministic finite state machines and to non deterministic finite machines and still represent the same regular language.

Please do not confuse this with regex. Most regex engines are far more expressive than formal regular expressions, finite state machines, and can represent non-regular languages.

Construction of a Regular Language

The set of all regular languages over a given alphabet Σ can be produced exactly by this process:

The empty language {}, rejecting all strings.
The language containing only the empty string ε
All languages containing only a single symbol s ∈ Σ.
Every language created by the union, concatenation, or kleene-star of regular languages. Suppose v and w are strings of a regular language A and B respectively:
- The union (v|w) is also regular. It accepts languages that are in any of A or B.
- The concatenation vw is also regular.
- The kleene-star v* is also regular. It means any copies of strings in A concatenated, including 0.

Examples and Nonexamples of Regular Languages

Given a simple alphabet Σ = {0, 1}, where | represents union, * represents kleene-star, these formal regular expressions all represent represents a regular language:
- The regular expression "0", "1", "(0|1)", "01", "11", "0*" are all regular.
- The regular expression "(0(0|1)*1)", representing all binary strings beginning with 0 and ending with 1, is regular.
- Given a regular expression R, the language "R+" and "R?" all represent a regular language, whereas + represents one or more, and ? represents zero or one. Namely, "R+" is equivalent to "RR*", and "R?" is equivalent to "(R|ε)".
- Given a regular expression R, the language "R{m,n}" is regular for all natural m,n, where {m,n} represents "from m copies to n copies". This is because it also involves union and concatenation: "R{1,3}" is expanded to "(R|RR|RRR)".
Given an alphabet used by regex engines, usually an ASCII or Unicode alphabet containing all ASCII or Unicode characters respectively:
- The regex /^.+$/ is regular. It includes all non-empty sequences of any character.
- The regex /^#[A-Za-z]{1,3}[0-9]{2,4}$/ represents a regular language, consisting all strings which being with a hashtag, then one to three ASCII letters, followed by two to four decimal digits.
- The regex /^([\d][\w])*$/ represents a regular language. It consists all strings which alternate digit characters and word characters. The shorthand \d and \w are examples of union.
Many regex engines are much more expressive than regular languages. Backreferences can cause a regex to represent a non-regular language, and consequently they cannot be decided by a finite state machine.
- The regex "(.+)\1" represents an irregular language. Involving a backreference capturing the first group .+, it accepts all the sequences of uppercase Latin letters repeated exactly twice. They are called squares in formal language theory.
  - "ABCABC", "1234.1234." are accepted
  - "ABCAB", "1234567891234567890" are rejected.

regex: Short for regular expression. Many regex engines nowadays are far more expressive than formal regular expressions and finite-state machines.
finite-state-machine: They are equivalent in expressiveness to formal regular expressions. They represent exactly the regular languages. Acronyms include FSM.
dfa: Deterministic finite automaton. nfa Nondeterministic finite automaton.
- Both are equivalent in expressiveness.
context-free-grammar, context-sensitive-grammar: Tags referring to lower levels of the Chomsky hierarchy
Wikipedia, includes explanation about squares and irregular regexes
What is a regular language?

914 questions

116

votes

8 answers

Regular vs Context Free Grammars

I'm studying for my computing languages test, and there's one idea I'm having problems wrapping my head around. I understood that regular grammars are simpler and cannot contain ambiguity, but can't do a lot of tasks that are required for…

context-free-grammar regular-language automata

asked Feb 18 '09 at 03:46

Jason Baker

192,085
135
376
510

votes

2 answers

What is a regular language?

I'm trying to understand the concept of languages levels (regular, context free, context sensitive, etc.). I can look this up easily, but all explanations I find are a load of symbols and talk about sets. I have two questions: Can you describe in…

syntax programming-languages bnf regular-language formal-languages

asked Jul 16 '11 at 15:04

FBryant87

4,273
2
44
72

votes

4 answers

Determining whether a regex is a subset of another

I have a large collection of regular expression that when matched call a particular http handler. Some of the older regex's are unreachable (e.g. a.c* ⊃ abc*) and I'd like to prune them. Is there a library that given two regex's will tell me if the…

regex regular-language halting-problem

asked Sep 10 '13 at 21:27

deft_code

57,255
29
141
224

votes

3 answers

chomsky hierarchy in plain english

I'm trying to find a plain (i.e. non-formal) explanation of the 4 levels of formal grammars (unrestricted, context-sensitive, context-free, regular) as set out by Chomsky. It's been an age since I studied formal grammars, and the various definitions…

grammar context-free-grammar regular-language context-sensitive-grammar

asked Dec 06 '11 at 09:57

tylerl

30,197
13
80
113

votes

1 answer

Iranian postal code validation

Please, I need to validate Iranian postal code using regex. I write this regex for this case \b([^02\n\D]){4}[^5](\d){5} but its not working on rule number 5 and 7. please help me to fix it. this is some rules about this regex: It's all numeric 10…

regex regular-language

asked Feb 10 '18 at 10:36

MasOOd.KamYab

votes

2 answers

Representing identifiers using Regular Expression

The regular definition for recognizing identifiers in C programming language is given by letter -> a|b|...z|A|B|...|Z|_ digit -> 0|1|...|9 identifier -> letter(letter|digit)* This definition will generate identifiers of the form identifier:…

regex computer-science regular-language

asked Feb 19 '13 at 09:19

Jeris

2,355
4
26
38

votes

6 answers

Shortest regex for binary number with even number of 0s or odd number of 1s

Write an expression that contains an even number of 0s or an odd number of 1s I got it down to: 1*(01*01*)* + 0*10*(10*10*)* where the first part represents an even number of 0s and the second part an odd number of 1s However, there's supposed to…

regex regular-language finite-automata

asked Dec 10 '13 at 03:28

user3085290

votes

2 answers

Left-Linear and Right-Linear Grammars

I need help with constructing a left-linear and right-linear grammar for the languages below? a) (0+1)*00(0+1)* b) 0*(1(0+1))* c) (((01+10)*11)*00)* For a) I have the following: Left-linear S --> B00 | S11 B --> B0|B1|011 Right-linear S --> 00B…

grammar regular-language computation-theory formal-languages

asked Dec 11 '12 at 08:38

user1585646

votes

13 answers

Regular expression for strings with even number of a's and odd no of b's

Im having a problem in solving the problem:- Its an assignment, i solved it, but it seems to be too long and vague, Can anyboby help me please...... Regular expression for the strings with even number of a's and odd number of b's where the character…

string language-theory regular-language

asked Sep 13 '10 at 07:50

Abu Mathew

votes

3 answers

The difference between Chomsky type 3 and Chomsky type 2 grammar

I'm having trouble articulating the difference between Chomsky type 2 (context free languages) and Chomsky type 3 (Regular languages). Can someone out there give me an answer in plain English? I'm having trouble understanding the whole hierarchy…

regular-language context-free-language chomsky-hierarchy

asked Jul 18 '12 at 19:57

Jay Jenkins

votes

3 answers

Aren't modern regular expression dialects regular?

I've seen a few comments here that mention that modern regular expressions go beyond what can be represented in a regular language. How is this so? What features of modern regular expressions are not regular? Examples would be helpful.

regex regular-language

asked Sep 30 '10 at 05:55

David Johnstone

24,300
14
68
71

votes

1 answer

Finding the complement of a DFA?

I am asked to show DFA diagram and RegEx for the complement of the RegEx (00 + 1)*. In the previous problem I had to prove that the complement of a DFA is closed and is a regular expression also, so I know that to convert a DFA, M to the complement,…

regex regular-language automata dfa nfa

asked Feb 10 '13 at 21:22

Matt Hintzke

7,744
16
55
113

votes

7 answers

Why is {a^nb^n | n >= 0} not regular?

In a CS course I'm taking there is an example of a language that is not regular: {a^nb^n | n >= 0} I can understand that it is not regular since no Finite State Automaton/Machine can be written that validates and accepts this input since it lacks a…

computer-science fsm regular-language

asked Feb 22 '10 at 08:52

Christophe Herreman

15,895
9
58
86

votes

2 answers

Is there a way to negate a regular expression?

Given a regular expression R that describes a regular language (no fancy backreferences). Is there an algorithmic way to construct a regular expression R* that describes the language of all words except those described by R? It should be possible as…

regex algorithm regular-language

asked Mar 11 '13 at 11:30

fuz

88,405
25
200
352

votes

2 answers

No \p{L} for JavaScript Regex ? Use Unicode in JS regex

I nedd to add a-zA-ZáàâäãåçéèêëíìîïñóòôöõúùûüýÿæœÁÀÂÄÃÅÇÉÈÊËÍÌÎÏÑÓÒÔÖÕÚÙÛÜÝŸÆŒ x time but I find this very ugly. So I try \p{L} but it does not working in JavaScript. Any Idea ? my actual regex :…

javascript regex regular-language

asked May 04 '18 at 15:35

charles Lgn

2 3

…

60 61 Next

Questions tagged [regular-language]

Construction of a Regular Language

Examples and Nonexamples of Regular Languages

Further Reading