The simplest representation of a statemachine is as a graph.
In most languages, you can use an array whose indices represent states, and whose elements are list of transitions to other states. It is often convenient to associate a human-readable state name with each index to support debugging. Transitions need two parts: a transition condition and a target statenumber.
We design an "internal" DSL in python by naming the graph constructors and elements we need. We can define a state machine (forgive my Python, I'm not an expert):
Class Transition // defines a single transition
def __init__(self, condition, target):
self.condition = condition // store a python lambda here
self.target = target // may be a state name (optimize to a state index)
Class State
def __init__(self, state_name, transitions):
self.state_name=state_name // text name of state
self.transitions=transitions // a "set" (array) of transitions
Class StateMachine:
def __init__(self, states):
self.states = states // a "set" (array) of named states
def AddState(self, state):
self.states.append(state)
def compile(self):
// replaces state names with state indexes; left as reader exercise
We can construct a statemachine to recognize typical identifier name (same as regexp "[A-Z][A-Z0-9]*"):
MyIdentifierRecognizer = StateMachine([])
MyIdentifierRecognizer.AddState(State("ScanFirstIdentifierCharacter",
[Transition(lambda x: IsLetter(x),"ScanNthIdentifierCharacter"),
Transition(lambda x: not(IsLetter(x)),"NotAnIdentifier)]))
MyIdentifierRecognizer.AddState(State("ScanNthIdentifierCharacter",
[Transition(lambda x: IsLetterOrDigit(x),"ScanNthIdentifierCharacter"),
Transition(lambda x: not(IsLetterOrDigit(x)),"EndOfValidIdentifier)]))
MyIdentifierRecognizer.AddState(State("NotAnIdentifier",[])) // no further transitions posssible
MyIdentifierRecognizer.AddState(State("EndOfValidIdentifier",[])) // no further transitions possible
Running the state machine is easy. We first set up a current_state to the initial state of the state machine:
current_state = "ScanFirstIdentifierCharacter"
then given an input character:
character = ...
we locate the current_state in the state machine, and then process the transitions looking for one that is satisfied to change current_state:
for s in StateMachine.states // find the actual state matching current_state
if s.name == current_state
state = s
break
for t in state.transitions // process all transitions (assumed mutually exclusive)
if t.condition(character)
current_state=t.target
break // adding this line means "take the first valid transition"
You can make this more efficient by running a little "compiler" after defining the state machine, that replaces textual state names with array indexes. Then the state lookup can be a direct array index.
If interpreting this state machine isn't fast enough, you can compile the individual states to their equivalent target language code and import that code. Done properly, this will actually produce very fast state machines.
[Note: author added an example flowchart to his question after I completed this answer. I leave it to him to convert his flowchart in this state machine notation].
One other consideration motivated by OP's original question related to decision trees: how are the transition predicates stored? In my version here, they are "opaque" Python lambdas and that will satisfy some needs.
If OP wants to store complex transition predicates in a non-opaque way,
he'll need the "transition" field to contain an abstract syntax tree representing the boolean condition of interest. Details on how to parse (boolean) and execute boolean expressions can be found at https://stackoverflow.com/a/2336769/120163