Making and comparing Sets in Coq

Question

I'm having trouble understanding whether it is possible to prove that two sets (in this case regular languages) are identical and thus interchangeable. From what I understand, sets can be equivalent even if they are not constructively equal.

Regular languages are sets of strings, but I don't see how to say that r1 = r2 so that something like symmetry can be used in a proof.

Here is my RegularLanguage declaration:

Inductive RegularLanguage (A : Set) : Set :=
  | EmptyLang : RegularLanguage A
  | EmptyStr : RegularLanguage A
  | Unit : A -> RegularLanguage A
  | Union :
    RegularLanguage A
    -> RegularLanguage A
    -> RegularLanguage A
  | Concat :
    RegularLanguage A
    -> RegularLanguage A
    -> RegularLanguage A
  | KleeneStar : RegularLanguage A -> RegularLanguage A
.

Even though I say the type is Set, this doesn't create a set of strings, as far as I can see. Do I need some function of type RegularLanguage -> Set of strings?

Thank you for the help.

score 8 · Accepted Answer · answered May 04 '15 at 16:45

There are a few misunderstandings about some Coq concepts which I'll try to clarify.

First, in Coq, you shouldn't view Set as what we call "set" in traditional mathematics. Instead, you should view it as a type. Coq also has Type, but for the purposes of this post you can view both Set and Type as being interchangeable. To avoid confusion, from now on, I will use "set" to refer to the usual concept of set in traditional mathematics, and "type" to refer to elements of Set and Type in Coq.

So, what exactly is the difference between a set and a type? Well, in normal mathematics, it makes sense to ask oneself whether anything is a member of any given set. Thus, if we were to develop the theory of regular expressions in normal mathematics, and each regular expression were seen as a set, it would make sense to ask questions such as 0 \in EmptyLang, because, even though 0 is a natural number, it could be the element of any set a priori. As a less contrived example, the empty string is both a member of the full language (i.e. the one that contains all strings) and of the Kleene closure of any base language.

In Coq, on the other hand, each valid element of the language has exactly one type. The empty string, for instance, has type list A for some A, which is written [] : list A. If we try to ask whether [] belongs to some regular language using the "has type" syntax, we get an error; try typing e.g.

Check ([] : EmptyLang).

Both sets and types can be seen as collections of elements, but types are in a sense more restrictive: for instance, one can take the intersection of two sets, but there's no way of taking the intersection of two types.

Second, when you write

Inductive RegularLanguage (A : Set) : Set := (* ... *)

this does not mean that the elements that you list below the header define sets nor types. What this means is that, for each type A (the (A : Set) part), you are defining a new type noted RegularLanguage A (the RegularLanguage (A : Set) : Set part), whose elements are freely generated by the list of constructors given. Thus, we have

EmptyLang : RegularLanguage nat

RegularLanguage nat : Set

but we don't have

EmptyLang : Set

(once again, you can try typing all the above judgments into Coq to see which are accepted and which aren't).

Being "freely generated" means, in particular, that the constructors you listed are injective and disjoint. As larsr noted previously, in particular, it is not the case that Union l1 l2 = Union l2 l1; as a matter of fact, you will usually be able to prove Union l1 l2 <> Union l2 l1. The problem is that there is a mismatch between the notion of equality that you get for inductively defined types in Coq (which you can't change) and your intended notion of equality for regular languages.

While there are a few ways around this, I think the easiest one is to use the setoid rewrite feature. This would involve first defining a function or predicate (e.g., as larsr suggested, a boolean function regexp_match : RegularLanguage A -> list A -> bool) to determine when a regular language contains some string. You can then define an equivalence relation on languages:

Definition regexp_equiv A (l1 l2 : RegularLanguage A) : Prop :=
  forall s, regexp_match l1 s = regexp_match l2 s.

and use setoid rewrite to rewrite with this equivalence relation. There is a small caveat, however, which is that you can only rewrite with an equivalence relation in contexts that are compatible with this equivalence relation, and you need to explicitly prove lemmas to do so. You can find more details in the reference manual.

score 1 · Answer 2 · answered May 04 '15 at 14:57

Two different elements in the set RegularLanguage A are not equal if they are constructed from different constructors. Consider L1 = ( a | b ) and L2 = ( b | a). Here L1 <> L2 because the arguments to the constructors are different.

In your notation, L1 might be Union nat (Unit nat 1) (Unit nat 2)), and L2 is Union nat (Unit nat 2) (Unit nat 1)).

Instead you want to say there is a function regexp_match : A -> RegularLanguage A -> list A -> bool that match strings in the given language. You will have to implement this function.

Two regexps are equal if they match and reject the same strings. For example, for L1 and L2:

Lemma L1_eq_L2 : forall S, regexp_match L1 S = regexp_match L2 S.

If you can prove this, then you can use this to replace regexp_match L1 S with regexp_match L2 S and vice versa where you want.

Making and comparing Sets in Coq

2 Answers2

Linked