11

I wrote Python script that processes big number of large text files and may run a lot of time. Sometimes, there is a need to stop the running script and to resume it later. The possible reasons to stop the script are program crash, disk 'out of space' situation or many others when you have to do it. I want to implement kind of "stop/resume" mechanism for the script.

  • On stop: the script quits & saves its current state.
  • On resume: the script starts, but continues from the latest saved state

I'm going to implement it using the pickle and the signal modules.

I'll be glad to hear how to do it in pythonic way.

Thank you!

codeape
  • 97,830
  • 24
  • 159
  • 188
Valentine
  • 147
  • 1
  • 5
  • You will likely need some external control, like a scheduled task (or a cron job in linux). Also, at program stop, write some status info to a specific file on disk so that your program knows what to do when it restarts – inspectorG4dget Jun 09 '11 at 21:37
  • If on *nix systems, you can use the standard SIGSTOP and SIGCONT signals, although the process will remain in (virtual) memory until continued. – tzot Jun 10 '11 at 09:25

3 Answers3

4

Here is something simple that hopefully can help you:

import time
import pickle


REGISTRY = None


def main(start=0):
    """Do some heavy work ..."""

    global REGISTRY

    a = start
    while 1:
        time.sleep(1)
        a += 1
        print a
        REGISTRY = pickle.dumps(a)


if __name__ == '__main__':
    print "To stop the script execution type CTRL-C"
    while 1:
       start = pickle.loads(REGISTRY) if REGISTRY else 0
        try:
            main(start=start)
        except KeyboardInterrupt:
            resume = raw_input('If you want to continue type the letter c:')
            if resume != 'c':
                break

Example of running:

$ python test.py
To stop the script execution type CTRL-C
1
2
3
^CIf you want to continue type the letter c:c
4
5
6
7
8
9
^CIf you want to continue type the letter c:
$ python test.py
fireant
  • 14,080
  • 4
  • 39
  • 48
mouad
  • 67,571
  • 18
  • 114
  • 106
  • 1
    The OP wants to process a number of text files. So, there should be file handles in the globals. `pickle` can't serialize file handles. Therefore, your answer should not work, in general… and especially for what the OP wants. – Mike McKerns Mar 13 '15 at 02:21
1

If you are looking to read big files, just use a file handle, and read the lines one at a time, processing each line as you need to. If you'd like to save the python session, then just use dill.dump_session -- and it will save all existing objects. Other answers will fail as pickle cannot pickle a file handle. dill, however, can serialize almost every python object -- including a file handle.

Python 2.7.9 (default, Dec 11 2014, 01:21:43) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> f = open('bigfile1.dat', 'r')
>>> data = f.readline()  
>>> 
>>> dill.dump_session('session.pkl')
>>> 

Then quit the python session, and restart. When you load_session, you load all the objects that existed at the time of the dump_session call.

dude@hilbert>$ python
Python 2.7.9 (default, Dec 11 2014, 01:21:43) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> dill.load_session('session.pkl')
>>> len(data)
9
>>> data += f.readline()
>>> f.close()
>>> 

Simple as that.

Get dill here: https://github.com/uqfoundation

Mike McKerns
  • 33,715
  • 8
  • 119
  • 139
0

The execution could sleep it's life away, or (aside from the exceptions of security), the state of the script can be pickled, zipped, and stored.

http://docs.python.org/library/pickle.html

http://docs.python.org/library/marshal.html

http://docs.python.org/library/stdtypes.html (5.9)

http://docs.python.org/library/archiving.html

http://www.henrysmac.org/?p=531

motoku
  • 1,571
  • 1
  • 21
  • 49