1

I am developing the machine learning analysis program which has to process the 27GB of text files in linux. Although my production system won't be rebooted very often but I need to test that in my home computer or development environment.

Now I have power failure very often so I can hardly run it continuously for 3 weeks.

My programs reads the files, applies some parsing, saves the filtered data in new files in dictionary, then I apply the algorithm on those files then saves result in mysqlDB.

I am not able to find how can I save the algorithm state.

0x90
  • 39,472
  • 36
  • 165
  • 245
user2027303
  • 609
  • 2
  • 7
  • 8
  • 1
    Related: http://stackoverflow.com/q/5697720/946850, http://stackoverflow.com/q/2134771/946850. Bottom line of the answers: Don't try saving the state of the entire process, but save "user data" (as suggested by tripplet). See also: http://en.wikipedia.org/wiki/Application_checkpointing – krlmlr Feb 01 '13 at 10:38

2 Answers2

2

I everything regarding the algorithm state is saved in a class, you can serialize the class an save it to disk: http://docs.python.org/2/library/pickle.html

tripplet
  • 335
  • 1
  • 11
1

Since the entire algorithm state can be saved in a class, you might want to use pickle (as mentioned above), but pickle comes with it's own overloads and risks.

For better ways to do the same, you might want to check out this article, which explains why you should use the camel library instead of pickle.

Cody Gray - on strike
  • 239,200
  • 50
  • 490
  • 574
Dhruv Shah
  • 130
  • 1
  • 1
  • 5