Build manager for GIS data processing

Question

My organization spends a lot of time processing GIS data. I have built a number of python scripts that perform different steps of the data processing. Other than the first script, all scripts rely on a different script to finish before it can start. Many of the scripts take 5+ minutes to execute (one is over an hour), so I do not want to repeat already-executed steps. I want this to work similar to Make, so that if an error occurs in "script3", I don't have to re-execute "script1" and "script2". I can just re-run "script3".

Is SCons the right tool for this? I looked at it, and it seems to be focused on compiling code rather than running scripts. I'm open to other suitable tools.

That's a good question. I have nothing against it - I have never used a build management system before. As I am most familiar with Python, and a variety of open source projects that I admire use scons, I thought that scons would be a great tool to learn. However, make might be a good place to start, especially since my problem is fairly simple. — Tanner Semerad, Feb 02 '12 at 17:37
I have such a system based on `make`, in my case for graph rewriting and visualisation. I now know that make wasn't made for this - but I haven't found a good alternative yet. — reinierpost, Feb 04 '12 at 18:42

score 2 · Answer 1 · edited May 23 '17 at 12:31

I am not sure a build system is what you want. Unless I am missing something, what you want is some kind of controlled automation to execute your processing tasks, and handle runtime errors.

Of course, 'make' and 'SCons' can do that, but it would be like using a bazooka to hammer a nail. And you're actually overlooking something that might be easier and more rewarding to invest time learning on the long run, which is Python itself. Python is a full-fledged, multi-paradigm programming language, with a lot of features for robust exception handling and interaction with the operating system (and it is heavily used in system administration on Unix-like platforms).

A first simple step would be to have a master script call each of your other scripts, each inside a try ... except block, and handle the exceptions according to your requirements. And you might improve on that as you go along, by refactoring your scripts into a consistent Python application.

Here are some links to start with: link1, link2.

The problem is not error handling but avoiding rerunning the same processing on the same input twice. Of course you can write your own system to handle that in Python. — reinierpost, Feb 04 '12 at 18:44
Yep, basically he needs to track the progress of the whole processing session. The simplest way might be, by modifying each script so that it stores its state somewhere (in a temporary file for instance) at the end of the task, and when the master script resumes on a failed task, it would check that file to decide which script to run next. — b2Wc0EKKOvLPn, Feb 04 '12 at 21:48

Build manager for GIS data processing

1 Answers1