I have a large series of raster datasets representing monthly rainfall over several decades. I've written a script in Python that loops over each raster and does the following:
- Converts the raster to a numpy masked array,
- Performs lots of array algebra to calculate a new water level,
- Writes the result to an output raster.
- Repeats
The script is just a long list of array algebra equations enclosed by a loop statement.
Everything works well if I just run the script on a small part of my data (say 20 years' worth), but if I try to process the whole lot I get a MemoryError
. The error doesn't give any more information than that (except it highlights the line in the code at which Python gave up).
Unfortunately, I can't easily process my data in chunks - I really need to be able to do the whole lot at once. This is because, at the end of each iteration, the output (water level) is fed back into the next iteration as the start point.
My understanding of programming is very basic at present, but I thought that all of my objects would just be overwritten on each loop. I (stupidly?) assumed that if the code managed to loop successfully once then it should be able to loop indefinitely without using up more and more memory.
I've tried reading various bits of documentation and have discovered something called the "Garbage Collector", but I feel like I'm getting out of my depth and my brain's melting! Can anyone offer some basic insight into what actually happens to objects in memory when my code loops? Is there a way of freeing-up memory at the end of each loop, or is there some more "Pythonic" way of coding which avoids this problem altogether?