2

I have a numpy array consisting of 20,000 RGB images of 220x220 pixels each. The array, X_data, therefore has the shape (20000, 220, 220, 3).

I'm looking for the fastest way to find the minimum and maximum pixel values across the entire dataset. I appreciate this type of task will take time because I'm searching through approximately 3 billion pixel values, but I'm hoping to improve on the solutions I have found already; which are the following:

Option 1: Flatten the array

Use np.flatten and then np.min and np.max on the resulting array:

flat = X_data.flatten()
np.min(flat)
np.max(flat)

This method took a total of 13min 11s (wall time) to find the min and max values.

Option 2: List comprehension

Use np.amin and np.amax to find the min and max for each image, append them to a list, and then find the min and max of that list:

min_val = np.min([np.amin(X_data[i]) for i in np.arange(X_data.shape[0])])
max_val = np.max([np.amax(X_data[i]) for i in np.arange(X_data.shape[0])])

This method took a total of 8 minutes (wall time).

Are there any faster methods for completing this task?

EDIT:

I forgot to mention in the original formulation of the question that I would like this to work on image datasets that have not been rescaled, ie those that contain images of varying sizes. This means that using np.min and np.max will not work, even though it is faster that the above options.

Many thanks!

celerygemini
  • 148
  • 1
  • 13

3 Answers3

4

The first approach is slower due to the flatten, which generates a copy of your data. In your case, the data is large, so a large portion of the runtime will be for memory allocation.

On the other hand, np.min by default will check along all axes, e.g.:

np.min([[1,2],[3,4]])
# out: 1

So you can just do:

min_val, max_val = np.min(X_data), np.max(X_data)

should be faster.

Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • Thanks for the reply! `np.min` is indeed faster, but the problem is that it only works if all the images are the same size. I'm looking for a general solution that works across arrays that include different image sizes, but I didn't make this clear in my question so I'll make an edit to that effect. – celerygemini Oct 23 '20 at 16:45
  • 1
    @celery_gemini in which case, only the second option works. Note that numpy operations only work best on homogeneous data, which isn’t the case here as you mentioned. – Quang Hoang Oct 23 '20 at 16:51
3

If maximum speed is important, it might be worth while adding in a library that can get the maximum and minimum values at the same time. Such a routine would only need to traverse the large array once instead of twice. Since getting these values is completely memory-limited on any modern platform (the computation is minimal), halving memory access is likely to halve execution time. Basically, you can get both values for the price of one.

This answer has a Fortran implementation linked into Python, and found that the Numpy function was excessively slow also. But that's an 8-year old answer, things have likely improved in the Numpy implementation.

DIPlib is an image processing library (I'm an author), and since I have it already installed it was easy for me to try out its function to get both the max and min value in an array. I'm using a smaller array than in the OP, I don't want to wait that long for the experiment to finish. Here's the code:

import numpy as np
import diplib as dip
import time

X = np.random.randn(2000, 220, 220, 3)

t = time.time()
np.min(X), np.max(X)
print(time.time() - t)

t = time.time()
dip.MaximumAndMinimum(X)
print(time.time() - t)

Median over three runs (without changing the X array) gives 0.361s for the first section, and 0.135s for the second section.

Cris Luengo
  • 55,762
  • 10
  • 62
  • 120
0

Finding the max (or the min) requires sorting the values. When both are needed, calling np.max() and then np.min() makes it sort the values twice...

I prefer simply using this :

the_min, the_max = np.percentile(my_data,[0,100])

which sorts the data once only...

Ananth
  • 815
  • 1
  • 4
  • 11
Lucien
  • 17
  • 1