3

I have written a Python code to train Brill Tagger from NLTK library on some 8000 English sentences and tag some 2000 sentences.

The Brill Tagger takes many, many hours to train and finally when it finished training, the last statement of the program had some tiny syntax error and the code, therefore, did not return the output.

Is it possible to retain the tagger in the trained state while correcting the error and getting the program running without having to wait several hours for the tagger to be trained on the very same data?

Jongware
  • 22,200
  • 8
  • 54
  • 100
singhuist
  • 302
  • 1
  • 6
  • 17

1 Answers1

5

Yes! You have a few options. One quick and dirty thing that I employ frequently is dropping to a console. Add this to the end of your script (right after the training finishes):

model = train_for_hours_and_hours()

import code
code.interact(local=locals())

This works exactly like just the REPL you get from running python3, except all of the variables (including your trained model) are available:

$ python3 script.py
[ ... THREE HOURS LATER ... ]
> print(model)
<NLTK.Model ...>

A more permanent solution would be to serialize your model and save it to a file right after training finishes. To do this you can use pickle:

import pickle
MODEL_FILE = 'model.pickle'

try:
    # Try to load the model from disk
    with open(MODEL_FILE, 'rb') as f:
        model = pickle.load(f)
except FileNotFoundError:
    # Train the model if it doesn't exist yet
    model = train_for_hours_and_hours()
    with open(MODEL_FILE, 'wb') as f:
        pickle.dump(f, model)

# now use `model` here
Bailey Parker
  • 15,599
  • 5
  • 53
  • 91
  • Sounds brilliant. But I believe I would still have to run the program again after adding all this stuff to prevent this further. Isn't there a way to get something out of my current output where the tagger is completely trained but just the tagging remains, without having to wait for hours? – singhuist Jan 20 '18 at 20:40
  • Yea you do. Unfortunately it's unlikely you can recover your current model unless you printed it out (and that printout has all of the data that was learned). Is your python script still running? If it crashed after the error, then you're out of luck for sure. If it's still running and you really care that much, you could try going into its memory and finding the structure for the model. But that's probably more work than just training the model again. – Bailey Parker Jan 20 '18 at 20:48
  • Yeah it crashed - but thanks for saving me from all this trouble again. – singhuist Jan 20 '18 at 20:50
  • Ah yea that's a shame. But, on the plus side you've hopefully learned the hard way now and you'll never forget to do something like I suggested above in the future! :) Feel free to accept this answer if it helped: https://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work – Bailey Parker Jan 20 '18 at 21:15