How to calculate mean and standard deviation of a set of images

Question

I would like to know I to calculate the mean and the std of a given dataset of RGB images.
For example, with imagenet we have imagenet_stats: ([0.485, 0.456, 0.406], [0.229, 0.224, 0.225].
I tried:

rgb_values = [np.mean(Image.open(img).getdata(), axis=0)/255 for img in imgs_path]
np.mean(rgb_values, axis=0)
np.std(rgb_values, axis=0)

I am not sure that the values I get are correct.
Which could be a better implementation?

@VladimirFokow Yes, I assume that `imagenet_stats` are calculated per pixel. — Simone, Aug 15 '22 at 08:31
Related question: Explanation of [`imagenet_stats `](https://stackoverflow.com/q/58151507/14627505) — Vladimir Fokow, Aug 15 '22 at 08:31
This explains how to calculate the `mean` and the `std` of a dataset of `RGB` images: [How imagenet mean and std derived?](https://datascience.stackexchange.com/q/77084/136526) (Data Science Stack Exchange). — Vladimir Fokow, Aug 15 '22 at 08:57

Vladimir Fokow · Answer 1 · 2022-08-15T20:28:19.533

Two solutions:

The first solution iterates over the images. It is MUCH slower than the second solution, and it uses the same amount of memory because it first loads and then stores all the images in a list. So it is strictly worse than the second solution, unless you will change how your images are loaded - load and process them one by one from disc.
The second solution needs to hold all images in memory at the same time. It is MUCH faster, because it is fully vectorized.

First solution (iterating over the images):

For each channel: R, G, B, here is how to calculate the means and stds of all the pixels in all the images:

Requirement:

Each image has the same number of pixels.

If this is not the case - use the second solution (below).

images_rgb = [np.array(Image.open(img).getdata()) / 255. for img in imgs_path]
# Each image_rgb is of shape (n, 3), 
# where n is the number of pixels in each image,
# and 3 are the channels: R, G, B.

means = []
for image_rgb in images_rgb:
    means.append(np.mean(image_rgb, axis=0))
mu_rgb = np.mean(means, axis=0)  # mu_rgb.shape == (3,)

variances = []
for image_rgb in images_rgb:
    var = np.mean((image_rgb - mu_rgb) ** 2, axis=0)
    variances.append(var)
std_rgb = np.sqrt(np.mean(variances, axis=0))  # std_rgb.shape == (3,)

Proof

... that the mean and std will be same if calculated like this, and if calculated using all pixels at once:

Let's say each image has n pixels (with values vals_i), and there are m images.

Then there are (n*m) pixels.

The real_mean of all pixels in all vals_is is:

total_sum = sum(vals_1) + sum(vals_2) + ... + sum(vals_m)
real_mean = total_sum / (n*m)

Adding up the means of each image individually:

sum_of_means = sum(vals_1) / m + sum(vals_2) / m + ... + sum(vals_m) / m
             = (sum(vals_1) + sum(vals_2) + ... + sum(vals_m)) / m

Now, what is the relationship between the real_mean and sum_of_means? - As you can see,

real_mean = sum_of_means / n

Analogously, using the formula for standard deviation, the real_std of all pixels in all vals_is is:

sum_of_square_diffs =  sum(vals_1 - real_mean) ** 2
                     + sum(vals_2 - real_mean) ** 2
                     + ... 
                     + sum(vals_m - real_mean) ** 2
real_std = sqrt( total_sum / (n*m) )

If you look at this equation from another angle, you can see that real_std is basically the average of average variances of n values in m images.

Verification

Real mean and std:

rng = np.random.default_rng(0)
vals = rng.integers(1, 100, size=100)  # data

mu = np.mean(vals)
print(mu)
print(np.std(vals))

50.93                 # real mean
28.048976808432776    # real standard deviation

Comparing it to the image-by-image approach:

n_images = 10

means = []
for subset in np.split(vals, n_images):
    means.append(np.mean(subset))
new_mu = np.mean(means)

variances = []
for subset in np.split(vals, n_images):
    var = np.mean((subset - mu) ** 2)
    variances.append(var)

print(new_mu)
print(np.sqrt(np.mean(variances)))

50.92999999999999     # calculated mean
28.048976808432784    # calculated standard deviation

Second solution (fully vectorized):

Using all the pixels of all images at once.

rgb_values = np.concatenate(
    [Image.open(img).getdata() for img in imgs_path], 
    axis=0
) / 255.

# rgb_values.shape == (n, 3), 
# where n is the total number of pixels in all images, 
# and 3 are the 3 channels: R, G, B.

# Each value is in the interval [0; 1]

mu_rgb = np.mean(rgb_values, axis=0)  # mu_rgb.shape == (3,)
std_rgb = np.std(rgb_values, axis=0)  # std_rgb.shape == (3,)

Is it necessary to concatenate all the images? Maybe is more efficient to calculate the mean and std per image, the result should be the same — Simone, Aug 15 '22 at 12:37
@Simone If ALL your images have the **exact same** number of pixels, then `overall_mean = sum(individual_means) / n_images` — Vladimir Fokow, Aug 15 '22 at 12:53
In case all images have the same number of pixels, it is necessary only `sum(individual_means)`, right? Why `rgb_stds = np.std(rgb_values - rgb_means, axis=0)` it should be `rgb_stds = np.std(rgb_values , axis=0)`, right? — Simone, Aug 15 '22 at 13:09
@Simone, about the mean of means: https://stats.stackexchange.com/q/133138 — Vladimir Fokow, Aug 15 '22 at 13:12
@Simone because in one of the links that I’ve posted I have read that when normalizing data, first you subtract the mean, and then you divide by `std` that is calculated on the result (after mean subtraction). But feel free to reverse the order. As I said, I don’t really know how they did it for `imagenet_stats` — Vladimir Fokow, Aug 15 '22 at 13:16
About the sum of means: consider two examples: `((1+2)/2 + (3+4)/2 ) /2`, and `sum(1,2,3,4)/4` — Vladimir Fokow, Aug 15 '22 at 13:20
If all the images have the same dimensions, is not necessary to divide by `N`. Regarding the `std`, maybe your formula is about `Standard Error of the Mean` — Simone, Aug 15 '22 at 13:53
@Simone Well, if you don’t divide by n, you will get: `(1+2)/2 + (3+4)/2 = 5` while the real mean is `(1+2+3+4)/4=2.5` … — Vladimir Fokow, Aug 15 '22 at 13:57
@Simone, added example as you wanted (processing images one by one) and proved the equations that I use. Also, with how you load your images, the second (my original) implementation is better - use *it*. — Vladimir Fokow, Aug 15 '22 at 17:49
I think that in the second example the `std_rgb` will be exactly the same if you subtract `mu_rgb` from `rgb_values` or not. So I removed it — Vladimir Fokow, Aug 15 '22 at 20:30