I have a numpy array consisting of 20,000 RGB images of 220x220 pixels each. The array, X_data
, therefore has the shape (20000, 220, 220, 3)
.
I'm looking for the fastest way to find the minimum and maximum pixel values across the entire dataset. I appreciate this type of task will take time because I'm searching through approximately 3 billion pixel values, but I'm hoping to improve on the solutions I have found already; which are the following:
Option 1: Flatten the array
Use np.flatten
and then np.min
and np.max
on the resulting array:
flat = X_data.flatten()
np.min(flat)
np.max(flat)
This method took a total of 13min 11s (wall time) to find the min and max values.
Option 2: List comprehension
Use np.amin
and np.amax
to find the min and max for each image, append them to a list, and then find the min and max of that list:
min_val = np.min([np.amin(X_data[i]) for i in np.arange(X_data.shape[0])])
max_val = np.max([np.amax(X_data[i]) for i in np.arange(X_data.shape[0])])
This method took a total of 8 minutes (wall time).
Are there any faster methods for completing this task?
EDIT:
I forgot to mention in the original formulation of the question that I would like this to work on image datasets that have not been rescaled, ie those that contain images of varying sizes. This means that using np.min
and np.max
will not work, even though it is faster that the above options.
Many thanks!