I made a simple regular expression engine, which supports concatenation, alternation, closure, and char a .. z
.
The way I represent nfa and dfa is to use record:
type state = int with sexp, compare
type alphabet = char with sexp, compare
type transaction = state * alphabet option * state with sexp, compare
type d_transaction = state * alphabet * state with sexp, compare
type state_set = State_set.t
type states_set = States_set.t
type nfa = {
states : State_set.t ;
alphabets : Alphabet_set.t ;
transactions : Transaction_set.t;
start_state : state;
final_states : State_set.t;
}
type dfa = {
d_states : State_set.t ;
d_alphabets : Alphabet_set.t;
d_transactions : D_Transaction_set.t ;
d_start_state : state ;
d_final_states : State_set.t;
}
For example, a string "a*" will be parsed into Closure (Char 'a')
, and then is transformed
to nfa:
states: 0 1 2 3
alphabets: a
transactions: 0->e->1, 1->a>2, 2->e->3, 2->e->1, 0->e->3
start_state: 0
final_states: 3
then dfa:
states: 0 1
alphabets: a
transactions: 0->a->1, 1->a->1
start_state: 0
final_states: 0 1
However, I use a lot of recursion in my code. The way my program generates state number for each node in nfa and dfa is realy unpredictable. I have no idea how to verify if the generated dfa is correct without using a pen and a piece of paper to test it myself
I am trying to figure out a better way to test my code so I can add more features into my program in the future.
Can anyone please give me some suggestions?