22

Does anyone know any examples of the following?

  1. Proof developments about regular expressions (possibly extended with backreferences) in proof assistants (such as Coq).
  2. Programs in dependently-typed languages (such as Agda) about regular expressions.
  • According to "Mastering Regular Expressions" (a book I recommend, see http://regex.info/) regular expressions are quite non-regular in fact due to their enhanced abilities, so mathematical theory is only available for simple/basic regex types. Does this have implications to use them in proof assistance? – Peter Kofler Jun 08 '09 at 10:09
  • 2
    Yes, it does: it makes proofs more complicated :) In fact, by "regular expressions" I mean the basic ones that are strictly defined in formal language theory. I'd like to know if there were attempts to specify backreferences or other non-regular constructions. I'm aware of quite limited formalisations of basic r.e. in Nuprl and Coq. Since these formalisations stemmed from theory, rather than programming practice, they didn't account for non-regular features. –  Jun 08 '09 at 14:27

5 Answers5

13

Certified Programming with Dependent Types has a section on creating a verified regular expression matcher. Coq Contribs has an automata contribution that might be useful. Jan-Oliver Kaiser formalized the equivalence between regular expressions, finite automata and the Myhill-Nerode characterization in Coq for his bachelors thesis.

anol
  • 8,264
  • 3
  • 34
  • 78
eaubin
  • 658
  • 9
  • 14
  • Right, I have also noted this paragraph in CPDT a few weeks ago. It does hit the ball. As per the automata contribution, it certainly is. However, it uses axioms. E.g., 4 axioms in the proof that regular languages are defined by NFAs (module RatReg). A proof without axioms is also possible (but not contained in the Coq contributions). –  Aug 14 '09 at 21:49
10

Moreira, Pereira & de Sousa, On the Mechanisation of Kleene Algebra in Coq gives a nice verified construction of the Antimirov derivative of regexps in Coq. It's pretty easy to read off a CFA from this construction, and to compute the intersection of regexps.

I'm not sure why you separate Coq from dependently typed programming: Coq essentially is programming in a polymorphic dependently typed lambda calculus with inductive types (i.e., CIC, the calculus of inductive constructions).

I've never heard of a formalisation of regexps in a dependently typed language, nor have I heard of something such as an Antimirov-like derivative for regexps with backtracking, but Becchi & Crowley, Extending Finite Automata to Efficiently Match Perl-Compatible Regular Expressions provide a notion of finite-state automata that matches a Perl-like regexp languages. That might attractive to formalisers in the near future.

Charles Stewart
  • 11,661
  • 4
  • 46
  • 85
7

See Perl Regular Expression Matching is NP-Hard

Regex matching is NP-hard when regexes are allowed to have backreferences.

Reduction of 3-CNF-SAT to Perl Regular Expression Matching

[...] 3-CNF-SAT is NP-complete. If there were an efficient (polynomial-time) algorithm for computing whether or not a regex matched a certain string, we could use it to quickly compute solutions to the 3-CNF-SAT problem, and, by extension, to the knapsack problem, the travelling salesman problem, etc. etc.

Community
  • 1
  • 1
Eugene Yokota
  • 94,654
  • 45
  • 215
  • 319
  • 5
    Even more than that: regular expression pattern matching with backreferences has been proved to be NP-complete by Alfred Aho quite some time ago by reduction from the vertex-cover problem. See Theorem 6.2 from [A. V. Aho. Algorithms for finding patterns in strings. In Jan van Leeuwen, editor, Handbook of Theoretical Computer Science, volume A: Algorithms and Complexity, pages 255–300. The MIT Press, 1990]. –  Jun 14 '09 at 08:23
6

I don't know of any development that treats regular expressions by themselves.

Finite automata, however, relevant since NFAs are the standard way to match those regular expressions, have been studied in NuPRL. Have a look at : Robert L. Constable, Paul B. Jackson, Pavel Naumov, Juan Uribe. Constructively Formalizing Automata Theory.

Should you be interested in approaching those formal languages through algebra, esp. developing finite semigroup theory, there are a number of algebra libraries developed in various theorem provers that you could think of using, with one particularly efficient in a finite setting.

Francois G
  • 11,957
  • 54
  • 59
5

The proof assistant Isabelle/HOL ships a number of formalized proofs regarding regular expressions (without back reference): http://afp.sourceforge.net/browser_info/devel/HOL/Regular-Sets/

(here is a paper by the authors regarding what they did exactly).

Another approach is to characterize regular expressions via Myhill-Nerode Theorem: http://www.dcs.kcl.ac.uk/staff/urbanc/Publications/itp-11.pdf

Matt W-D
  • 1,605
  • 2
  • 19
  • 22