There are a few misunderstandings about some Coq concepts which I'll try to clarify.
First, in Coq, you shouldn't view Set
as what we call "set" in traditional mathematics. Instead, you should view it as a type. Coq also has Type
, but for the purposes of this post you can view both Set
and Type
as being interchangeable. To avoid confusion, from now on, I will use "set" to refer to the usual concept of set in traditional mathematics, and "type" to refer to elements of Set
and Type
in Coq.
So, what exactly is the difference between a set and a type? Well, in normal mathematics, it makes sense to ask oneself whether anything is a member of any given set. Thus, if we were to develop the theory of regular expressions in normal mathematics, and each regular expression were seen as a set, it would make sense to ask questions such as 0 \in EmptyLang
, because, even though 0
is a natural number, it could be the element of any set a priori. As a less contrived example, the empty string is both a member of the full language (i.e. the one that contains all strings) and of the Kleene closure of any base language.
In Coq, on the other hand, each valid element of the language has exactly one type. The empty string, for instance, has type list A
for some A
, which is written [] : list A
. If we try to ask whether []
belongs to some regular language using the "has type" syntax, we get an error; try typing e.g.
Check ([] : EmptyLang).
Both sets and types can be seen as collections of elements, but types are in a sense more restrictive: for instance, one can take the intersection of two sets, but there's no way of taking the intersection of two types.
Second, when you write
Inductive RegularLanguage (A : Set) : Set := (* ... *)
this does not mean that the elements that you list below the header define sets nor types. What this means is that, for each type A
(the (A : Set)
part), you are defining a new type noted RegularLanguage A
(the RegularLanguage (A : Set) : Set
part), whose elements are freely generated by the list of constructors given. Thus, we have
EmptyLang : RegularLanguage nat
RegularLanguage nat : Set
but we don't have
EmptyLang : Set
(once again, you can try typing all the above judgments into Coq to see which are accepted and which aren't).
Being "freely generated" means, in particular, that the constructors you listed are injective and disjoint. As larsr noted previously, in particular, it is not the case that Union l1 l2 = Union l2 l1
; as a matter of fact, you will usually be able to prove Union l1 l2 <> Union l2 l1
. The problem is that there is a mismatch between the notion of equality that you get for inductively defined types in Coq (which you can't change) and your intended notion of equality for regular languages.
While there are a few ways around this, I think the easiest one is to use the setoid rewrite feature. This would involve first defining a function or predicate (e.g., as larsr suggested, a boolean function regexp_match : RegularLanguage A -> list A -> bool
) to determine when a regular language contains some string. You can then define an equivalence relation on languages:
Definition regexp_equiv A (l1 l2 : RegularLanguage A) : Prop :=
forall s, regexp_match l1 s = regexp_match l2 s.
and use setoid rewrite to rewrite with this equivalence relation. There is a small caveat, however, which is that you can only rewrite with an equivalence relation in contexts that are compatible with this equivalence relation, and you need to explicitly prove lemmas to do so. You can find more details in the reference manual.