The information you need is in the documentation which comes with SML available in various places. Many university courses have online notes which contain working examples.
The first thing to note from your example code is that you have overloaded the name alpha
and used it to name a state and a pattern. This is probably not a good idea. The pattern alphanum
is not not defined, and the result ID
is not declared. Some basic errors which you should probably fix before thinking about using states - or posting a question here on SO. Asking for help for code with such obvious faults in it is not encouraging help from the experts. :-)
Having fixed up those errors, we can start using states. Here is my version of your code:
datatype lexresult = ID
| EOF
val error = fn x => TextIO.output(TextIO.stdOut,x ^ "\n")
val eof = fn () => EOF
%%
%structure myLang
digit=[0-9];
ws=[\ \t\n];
str=\"[.*]+\";
strop=\[[0-9...?\^]\];
%s ALPHA_STATE;
alpha=[a-zA-Z];
alphanum=[a-zA-Z0-9];
%%
<INITIAL>{alpha} => (YYBEGIN ALPHA_STATE; continue());
<ALPHA_STATE>{alphanum}+ => (YYBEGIN INITIAL; TextIO.output(TextIO.stdOut,"ID\n"); ID);
. => (error ("myLang: ignoring bad character " ^ yytext); lex());
You can see I've added ID
to the lexresult
, named the state ALPHA_STATE
and added the alphanum
pattern. Now lets look at how the state code works:
There are two states in this program, they are called INITIAL
and ALPHA_STATE
(all lex programs have an INITIAL
default state). It always begins recognising in the INITIAL
state. Having a rule <INITIAL>{alpha} =>
indicates that if you encounter a letter when in the initial state (i.e. NOT in the ALPHA_STATE
) then it is a match and the action should be invoked. The action for this rule works as follows:
YYBEGIN ALPHA_STATE; (* Switch from INITIAL state to ALPHA_STATE *)
continue() (* and keep going *)
Now we are in ALPHA_STATE
it enables those rules defined for this state, which enable the rule <ALPHA_STATE>{alphanum} =>
. The action on this rule switch back to the INITIAL
state and record the match.
For a longer example of using states (lex rather than ML-lex) you can see my answer here: Error while parsing comments in lex.
To test this ML-LEX program I referenced this helpful question: building a lexical analyser using ml-lex, and generated the following SML program:
use "states.lex.sml";
open myLang
val lexer =
let
fun input f =
case TextIO.inputLine f of
SOME s => s
| NONE => raise Fail "Implement proper error handling."
in
myLang.makeLexer (fn (n:int) => input TextIO.stdIn)
end
val nextToken = lexer();
and just for completeness, it generated the following output demonstrating the match:
c:\Users\Brian>"%SMLNJ_HOME%\bin\sml" main.sml
Standard ML of New Jersey v110.78 [built: Sun Dec 21 15:52:08 2014]
[opening main.sml]
[opening states.lex.sml]
[autoloading]
[library $SMLNJ-BASIS/basis.cm is stable]
[autoloading done]
structure myLang :
sig
structure UserDeclarations : <sig>
exception LexError
structure Internal : <sig>
val makeLexer : (int -> string) -> unit -> Internal.result
end
val it = () : unit
hello
ID