1

I am using Python 3 and Numpy for some scientific data analysis, and I am faced with a memory related issue. When looping over a list of numpy arrays (a few thousand of them) and doing a few intermediate calculations, I have noticed python taking up over 6GB more memory than I would have expected it to. I have isolated the issue to a single function, shown below:

def overlap_correct(self):
    running_total = np.zeros((512, 512))
    shutter = 0
    for data_index in range(len(self.data)):
        if self.TOF[data_index] < self.shutter_times[shutter]:
            occupied_prob = running_total/self.N_TRIGS[data_index]
            running_total += self.data[data_index]
            self.data[data_index] = np.round(np.divide(self.data[data_index], (1 - occupied_prob)))
        else:
            running_total = np.zeros((512, 512))
            shutter += 1

The relevant data structures here are self.data which is a list with a fwe thousand 512x512 numpy arrays, self.TOF and self.N_TRIGS are numpy arrays of a few thousand floats, and self.shutter times is a numpy array with three floats.

During the processing of this loop, which takes a few minutes, I can observe the memory usage of Python gradually increasing, until the loop finishes with about 6GB more memory used up than when it started.

I have used memory_profiler and objgraph to analyse the memory usage without any success. I am aware that before and after the loop, self.data, self.TOF, self.N_TRIGS, and self.shutter remain the same size, and hold the same number and elements of the same type. If I understand this correctly, local variables such as occupied _prob should get out of scope after every iteration of the for loop, and if not, all redundant memory should be garbage collected after the function returns back to the main loop. This does not happen, and the 6GB remain locked up until the script terminates. I have attempted to also run manual garbage collection using gc.collect() without any results.

If it helps, this function exists inside a thread and is part of a larger data analysis process. No other threads attempt to concurrently access the data, and after the thread exits, self.data is copied to a different class. The instance of the thread is then destroyed by going out of scope. I have also attempted to manually destroy the thread using del thread_instance as well as thread_instance = None, but the 6GB remains locked up. This is not a huge issue on the development machine, but the code will be part of a larger package which may run on machines with limited RAM.

Pika Supports Ukraine
  • 3,612
  • 10
  • 26
  • 42
  • 1
    6GB seems about right for holding an array of size ```few thousands x 512 x 512 x total bytes per value```. Or Do you mean 6GB more than what you would otherwise expect with just the storage of these structure in memory? – faisal Jan 13 '19 at 04:44
  • You're using python 3, so isn't `/` the same as `np.divide`? (With `//` now being floor division?). I think you can do `self.data[data_index] /= 1 - occupied_prob` and save an array-copy step. I also suspect you can round in-place. – aghast Jan 13 '19 at 04:50
  • @faisal the data is stored in `self.data` and already takes up about a GB or two but no more, as far as am aware. The issue is that after this function returns, around 6GB MORE memory is used up that I am unable to free. – Alexander Liptak Jan 13 '19 at 05:16
  • @AustinHastings thanks for your ideas, doing one less array-copy was a good idea. I am unaware of in-place rounding, only forcing dtype=int and dropping the decimal point, which isn't a good solution for me – Alexander Liptak Jan 13 '19 at 05:17
  • @AlexanderLiptak What are the values in self.data before calling this function, is it all zeros? That can also explain difference in memory as explained [here](https://stackoverflow.com/questions/44487786/performance-of-zeros-function-in-numpy). – faisal Jan 13 '19 at 06:49
  • @faisal it, unfortunately, is not, `self.data` is filled with integers (some of which may be zero but most are not). The function `def overlap_correct()` here only alters the data slightly but is still forced to be of the same `dtype=int`, so that does not explain the extra memory. What's more, I checked the size of `self.data` before and after the function ran, and it was the same. – Alexander Liptak Jan 13 '19 at 16:58
  • You are append to an array, you may be triggering a lot a copies at `+=` line. But I don't know the numpy array semantics – geckos Jan 13 '19 at 19:19
  • @geckos I believe the `+=` when using numpy arrays simply adds the contents of the array to another array as long as its possible within its set `dtype` without actually copying the array to another location in memory. I may be wrong though. – Alexander Liptak Jan 13 '19 at 19:24

1 Answers1

4

I have managed to find a solution to the issue. TL;DR: During the execution of the function, the dtype of self.data was not enforced.

The first issue that prevented me from realising this is that by using sys.getsizeof() to see how much space self.data was taking up in memory, I was given the size of the list of pointers to numpy.ndarray objects, which remained the same as the number of arrays did not change.

Secondly, as I was checking the dtype of self.data[0], which was the only unchanged data "slide", I wrongly assumed that the whole list of arrays also had the same dtype.

I suspect that the reason as to why the dtype of some of the arrays was changed is that np.round() returns a rounded float.

By changing the structure of self.data from a list of a few thousand 256x256 arrays into a 3D array of [a few thousand]x[256]x[256], the function no longer guessed the dtype of the data, but silently casted the float64 returned by np.round to uint16:

self.data = np.asarray(self.data, dtype='uint16')