6

I have heard many good things about OpenFST, yet I struggle with making it work. I am constructing an FST automaton (fstcompile) that I want to use as an acceptor to check if a set of strings are matching (very much alike regular expressions but with the advantages provided by optimizations of the automatons provided by OpenFST). And here is the thing:
How to check if the resulting automaton accepts a string?

I found a suggestion that the input string shall be turned into a simple automaton and composed with the accepting automaton to get a result. I found it highly cumbersome and strange. Is there an easier way (either via cmd line or Python/C++)?

sophros
  • 14,672
  • 11
  • 46
  • 75
  • It is not too hard to create a python script that specifies the linear automaton. Not sure why it wasn't included in OpenFST itself – qwr Sep 26 '18 at 03:30
  • 1
    Precisely! The documentation is written in terms of abstract math which is quite deterring for new users. It would be a pity to scare away users from otherwise very fine library! – sophros Sep 29 '18 at 06:38

1 Answers1

2

Here's a quick example on how you can test whether an automaton accepts a string using Open FST's Python wrapper. Indeed, you have to turn your input into an automaton, and Open FST doesn't even create this "linear chain automata" for you! Fortunately, it's simple to automate this process as seen below:

def linear_fst(elements, automata_op, keep_isymbols=True, **kwargs):
    """Produce a linear automata."""
    compiler = fst.Compiler(isymbols=automata_op.input_symbols().copy(), 
                            acceptor=keep_isymbols,
                            keep_isymbols=keep_isymbols, 
                            **kwargs)

    for i, el in enumerate(elements):
        print >> compiler, "{} {} {}".format(i, i+1, el)
    print >> compiler, str(i+1)

    return compiler.compile()

def apply_fst(elements, automata_op, is_project=True, **kwargs):
    """Compose a linear automata generated from `elements` with `automata_op`.

    Args:
        elements (list): ordered list of edge symbols for a linear automata.
        automata_op (Fst): automata that will be applied.
        is_project (bool, optional): whether to keep only the output labels.
        kwargs:
            Additional arguments to the compiler of the linear automata .
    """
    linear_automata = linear_fst(elements, automata_op, **kwargs)
    out = fst.compose(linear_automata, automata_op)
    if is_project:
        out.project(project_output=True)
    return out

def accepted(output_apply):
    """Given the output of `apply_fst` for acceptor, return True is sting was accepted."""
    return output_apply.num_states() != 0

Let's define a simple Acceptor that only accepts series of "ab":

f_ST = fst.SymbolTable()
f_ST.add_symbol("<eps>", 0)
f_ST.add_symbol("a", 1)
f_ST.add_symbol("b", 2)
compiler = fst.Compiler(isymbols=f_ST, osymbols=f_ST, keep_isymbols=True, keep_osymbols=True, acceptor=True)

print >> compiler, "0 1 a"
print >> compiler, "1 2 b"
print >> compiler, "2 0 <eps>"
print >> compiler, "2"
fsa_abs = compiler.compile()
fsa_abs

enter image description here

Now we can simply apply the Acceptor using :

accepted(apply_fst(list("abab"), fsa_abs))
# True
accepted(apply_fst(list("ba"), fsa_abs))
# False

To see how to use the transducer look at my other answer

Yann Dubois
  • 1,195
  • 15
  • 16
  • thank you for the detailed example. I am wondering if adding `c` in `f_ST.add_symbol("c", 3)` to the transducer vocabulary is really needed? It looks to me somewhat superfluous. – sophros Jan 07 '19 at 08:14
  • @sophros good catch, I modified my example but forgot to update the symbol table. I've now edited it. – Yann Dubois Jan 07 '19 at 12:34