0

I am creating a finite-state automata (FSA), which is basically a tree. Each letter in a word constitutes a node (State). A string of letters constitutes a path through the FSA, and paths can overlap. I seem to be having an issue, though, with the node's next_states, or children.

The input file, vocab.small, looks like this, with a \n at the end of each line:

A 
A A A 
A A B E R G 
A A C H E N

Right now, I construct my FSA very similar to a tree, but where I create a State start_state and State end_state and link them:

class State (object):
    __slots__ = "chars","next_states"
    def __init__(self,chars,next_states=[]):
        self.chars = chars
        self.next_states = next_states

class FSA (object):
    __slots__ = "vocab","start_state","end_state"
    def __init__(self,vocab):
        self.vocab = vocab
        self.start_state = State("0")
        self.end_state = State("1")
        self.end_state.next_states.append(self.start_state)
        self.start_state.next_states.append(self.end_state)

The other methods in FSA are:

    def create_fsa(self):
        vocab_file = open(self.vocab, 'r')
        for word in vocab_file:
            self.add_word(word)
        vocab_file.close()

    def add_word(self,word,current_state=None):
        next_char = word[0:2]
        if current_state == None:
            current_state = self.start_state
        ## next_char found in current_state.next_states
        if next((x for x in current_state.next_states if x.chars == next_char), None):
            if len(word) > 3:
                print "next_char found"
                current_state = next((x for x in current_state.next_states if x.chars == next_char), None)
                self.add_word(word[2:],current_state)
            else:
                print "next_char found + add end"
                current_state = next((x for x in current_state.next_states if x.chars == next_char), None)
                current_state.next_states.append(self.end_state)
##                current_state.children[current_state.children.index(next_char)].append(self.end_state)
        ## next_char NOT found in current_state.next_states
        else:
            current_state.next_states.append(State(next_char))
            current_state = next((x for x in current_state.next_states if x.chars == next_char), None)
            if len(word) > 3:
                print "next_char added"
                self.add_word(word[2:],current_state)
            else:
                print "next_char added + end"
                current_state.next_states.append(self.end_state)
        print

The problem seems to be with current_state.next_states[], as it continues to accrue objects, rather being made new upon each new State. Is the problem in lines like current_state.next_states.append(State(next_char)), with how I create a new State object?

Thanks for the help.

jogojapan
  • 68,383
  • 11
  • 101
  • 131
Adam_G
  • 7,337
  • 20
  • 86
  • 148
  • 1
    The problem is here: `def __init__(self,chars,next_states=[]):`. This is only executed once, and so every `State` instance shares the same `next_states` list when one isn't passed. Read [this question](http://stackoverflow.com/questions/1132941/least-astonishment-in-python-the-mutable-default-argument), and don't give up until you see why it makes sense that Python works the way it does. :^) – DSM Feb 23 '13 at 18:51
  • @DSM, thank you so much for this. I cannot tell you how much I appreciate your taking the time to look at my code, and find the issue. I know I posted a lot of code, and so I know it wasn't easy. Reading that other question helped out so much. If you want to make this an "answer", I will mark it correct. Thank you again! – Adam_G Feb 23 '13 at 19:14

0 Answers0