I am struggling working with large numpy arrays. Here is the scenario. I am working with 300MB - 950MB images and using GDAL to read them as Numpy arrays. Reading in the array uses exactly as much memory as one would expect, ie. 250MB for a 250MB image, etc...
My problem occurs when I use numpy to get the mean, min, max, or standard deviation. In main() I open the image and read the array (type ndarray). I then call the following function, to get the standard deviation, on a 2D array:
def get_array_std(input_array):
array_standard_deviation = numpy.std(input_array, copy=False)
return array_standard_deviation
Here I am constantly having memory errors (on a 6GB machine). From the documentation it looks like numpy is returning an ndarray with the same shape and dtype as my input, thereby doubling the in memory size.
Using:
print type(array_standard_deviation)
Returns:
numpy.float64
Additionally, using:
print array_standard_deviation
Returns a float std as one would expect. Is numpy reading the array in again to perform this calculation? Would I be better off iterating through the array and manually performing the calculation(s)? How about working with a flattened array?
I have tried placing each statistic call (numpy.amin(), numpy.amax(), numpy.std(), numpy.mean()) into their own function so that the large array would go out of scope, but no luck there. I have also tried casting the return to another type, but no joy.