1

I am working with numpy.ndarray including 286 images with the shape of (286, 16, 16, 3). Each image contains 3 bands with varying pixel values with float32 data types. The maximum value of pixel value in each band can be more than 255. Is it possible to normalize this numpy.ndarray between [0-1]?

code for reading the images:

inputPath='E:/Notebooks/data'

images = []

# Load in the images
for filepath in os.listdir(inputPath):
    images.append(cv2.imread(inputPath+'/{0}'.format(filepath),flags=(cv2.IMREAD_ANYCOLOR | cv2.IMREAD_ANYDEPTH)))
rayan
  • 332
  • 1
  • 12
  • normalize each of the 286 images or over all images? – chillking Jun 08 '21 at 10:57
  • what do you mean by normalize ? taking minimum and max float32 and put them at 0 and 255 and distribute the in between values in the 0 to 255 range ? or something else ? – pippo1980 Jun 08 '21 at 11:04
  • Normalize each of the 256 images within this data set. – rayan Jun 08 '21 at 11:07
  • uch that's I believe its more difficult. here: https://stackoverflow.com/questions/1735025/how-to-normalize-a-numpy-array-to-within-a-certain-range I found # Normalised [0,255] as integer: don't forget the parenthesis before astype(int) c = (255*(a - np.min(a))/np.ptp(a)).astype(int) but I believe it normaize over all the images, I can only thing splitting your array in 286 images and apply it. Maybe there is another way – pippo1980 Jun 08 '21 at 11:11
  • ii = (255*(i - np.min(i))/np.ptp(i)).astype(int) gives RuntimeWarning: invalid value encountered in true_divide with numpy array of float32 type !!! need to figure out why – pippo1980 Jun 08 '21 at 17:08

3 Answers3

2

If you want the range of values of every image to be between 0 and 255, you could loop over the images, calculate min and max of the original image and squeeze them, so the minimum is 0 and the maximum is 255.

import numpy as np
#images = np.random.rand(286,16,16,3)
images = np.random.rand(286,16,16,3).astype(np.float32)

for nr,img in enumerate(images):
    min = np.min(img)
    max = np.max(img)
#   images[nr] = (img - min) * (255/(max-min))
    images[nr] = (img - min) / (max - min) * 255
chillking
  • 311
  • 1
  • 9
  • Each image contains 3 bands with varying pixel values with float32 data types – pippo1980 Jun 08 '21 at 17:09
  • works the same, doesn't it? Just use `images = np.random.rand(286,16,16,3).astype(np.float32)` – chillking Jun 09 '21 at 08:10
  • havent checked, I'll try if I have time – pippo1980 Jun 09 '21 at 08:16
  • 1
    tried your code(original one) I get: 768 (16, 16, 3) 0.0 255.00000000000003 768 (16, 16, 3) 0.0 255.0 768 (16, 16, 3) 0.0 255.0 768 (16, 16, 3) 0.0 255.0 768 (16, 16, 3) 0.0 254.99999999999997 not sure floats are permitted – pippo1980 Jun 09 '21 at 08:36
  • images = np.random.rand(286,16,16,3).astype(np.float32) works with same glitch of float64 – pippo1980 Jun 09 '21 at 08:43
  • Do you mean that the maximum values isn't always exactly 255.0? That could be corrected by reordering the calculation. Instead of `(img-min) * 255 / (max-min)` we can use `(img-min) / (max-min) * 255`. I added this to the original answer as well. – chillking Jun 09 '21 at 09:19
  • no I mean isn't exactly 255. Not sure 255.0 is a valid entry for an image. I'll try it out. – pippo1980 Jun 09 '21 at 15:58
  • nope can't convert to image with img = Image.fromarray(array) – pippo1980 Jun 09 '21 at 16:38
  • aha, I'm not exactly sure because I don't know where the function `Image.fromarray(...)` comes from, but I guess you need to cast the values as uint8 before using your function. – chillking Jun 10 '21 at 11:04
  • @chillkind pillow/pil library – pippo1980 Jun 10 '21 at 12:40
  • I don't know why but images[nr] = (((img - min) / (max - min) )* 255).astype(np.uint8) or .astype(int) doesnt work inside the enumerate loop. Any idea ? while of course images = images.astype(np.uint8) does the trick – pippo1980 Jun 10 '21 at 13:30
  • I think (!) it's because images is one single array, not a list of arrays. And the data type of the array elements are always the same. As the array was build with float32, one is not able to write just a part of it as something else than float32. – chillking Jun 10 '21 at 15:52
1

Vectorized is much faster than iterative

If you want to scale the pixel values of all your images using numpy arrays only, you may want to keep the vectorized nature of the operation (by avoiding loops).

Here is a way to scale your images :

# Getting min and max per image
maxis = images.max(axis=(1,2,3))
minis = images.min(axis=(1,2,3))
# Scaling without any loop
scaled_images = ((images.T - minis) / (maxis - minis) * 255).T
# timeit > 178 µs ± 1.24 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

The transposes .T were necessary here to broadcast correctly the subtraction.

We can check if this is correct:

print((scaled_images.min(axis=(1,2,3)) == 0).all())
# > True
print((scaled_images.max(axis=(1,2,3)) == 255).all())
# > True

Scaling into the [0, 1] range

If you want pixel values between 0and 1, we simply remove the x255 multiplication:

scaled_images = ((images.T - minis) / (maxis - minis)).T

Only with numpy arrays and such

You must also make sure you are handling a numpy array in the first place, not a list :

import numpy as np
images = np.array(images)

OpenCV

On-the-go scaling

Since you are using opencv to read your images one by one, you can normalize your images on the go with it:

inputPath='E:/Notebooks/data'

max_scale = 1   # or 255 if needed
# Load in the images 
images = [cv2.normalize(
    cv2.imread(inputPath+'/{0}'.format(filepath),flags=(cv2.IMREAD_ANYCOLOR | cv2.IMREAD_ANYDEPTH)),
    None, 0, max_scale, cv2.NORM_MINMAX)
    for filepath in os.listdir(inputPath)]

Make sure you have images in the folder

inputPath='E:/Notebooks/data'
images = []

max_scale = 1   # or 255 if needed

# Load in the images 
for filepath in os.listdir(inputPath):
    image = cv2.imread(inputPath+'/{0}'.format(filepath),flags=(cv2.IMREAD_ANYCOLOR | cv2.IMREAD_ANYDEPTH))
    # Scale and append the list if it is an image
    if image is not None:
        images.append(cv2.normalize(image, None, 0, max_scale, cv2.NORM_MINMAX))

Bug on versions of open-cv prior to 3.4

As reported here, there is a bug with opencv's normalize method producing values below the alpha parameter. It was corrected on version 3.4.

Here is a way to scale images on-the-go with older versions of open-cv:

def custom_scale(img, max_scale=1):
    mini = img.min()
    return (img - mini) / (img.max() - mini) * max_scale

max_scale = 1   # or 255 if needed

images = [custom_scale(
    cv2.imread(inputPath+'/{0}'.format(filepath),flags=(cv2.IMREAD_ANYCOLOR | cv2.IMREAD_ANYDEPTH)), max_scale)
    for filepath in os.listdir(inputPath)]
Whole Brain
  • 2,097
  • 2
  • 8
  • 18
  • I added the code for reading the images in the question. is it possible to normalize the images between [0-1]? when I apply your code it returns error: AttributeError: 'list' object has no attribute 'max' – rayan Jun 09 '21 at 11:18
  • I updated the answer. You said you had an `numpy ndarray` but it seems that `images` was a `list` in your case. – Whole Brain Jun 09 '21 at 11:24
  • Thanks for the reply. I use the list to read in the 286 images. when I check the `print(type(images[1]))` result is `class 'numpy.ndarray`. if I use `images = np.array(images)`, can I read in all the images? – rayan Jun 09 '21 at 11:37
  • Yes, you had `numpy arrays` inside a `list` called "images". I mentioned in my last edit that you should use `opencv` to normalize your images on the go, since you are already using it and adding your images iteratively. You don't need to use numpy or to cast your list into an array, for that. – Whole Brain Jun 09 '21 at 11:40
  • Thanks! I run the code without any error. may I know how I can extract the min and max of each image? I used `print(np.max(images[0]))`, but it returns None. – rayan Jun 09 '21 at 11:53
  • That's probably because you have non-image files in the `inputPath` folder. I updated the answer. – Whole Brain Jun 09 '21 at 12:03
  • I corrected an issue regarding the path of the data and `print(np.max(images[i]))` returning the min and max values. I think your code should be refined because `print(np.min(images[2]))` gives -6.0439154e-11. – rayan Jun 09 '21 at 12:03
  • I can see the negative values for min of some images – rayan Jun 09 '21 at 12:04
  • It might be a bug in old versions of openCV. Can you give the result of this `print(cv2.__version__)` – Whole Brain Jun 09 '21 at 12:12
  • This is "3.3.1" – rayan Jun 09 '21 at 12:15
  • The bug was fixed from `version 3.4` : https://github.com/opencv/opencv/pull/11285. You can update your opencv version, or use the numpy solution if you are constrained to use your current version. – Whole Brain Jun 09 '21 at 12:18
  • 1
    Thanks for your help and time. I could run it in google colab and the negative values are 0. I am not sure why I can not upgrade and even uninstall open cv from anaconda. – rayan Jun 09 '21 at 12:56
  • There could be multiple reasons for this. Have you tried `pip install --upgrade opencv-python` ? (If you are using pip and your environment is activated) – Whole Brain Jun 09 '21 at 13:04
  • print((scaled_images.min(axis=(1,2,3)) == 0).all()) # > True print((scaled_images.max(axis=(1,2,3)) == 255).all()) # > True even if print(scaled_images.size, scaled_images.shape, np.min(scaled_images), np.max(scaled_images)) 219648 (286, 16, 16, 3) 0.0 255.0 and 255.0 its not valid band value for pillow – pippo1980 Jun 09 '21 at 17:03
  • Pillow only wants 8 bits per channel for RGB images. Thus it's mandatory to cast an array's datatype to `uint8` specifically (if not already) for Pillow to be able to read it, with `an_array.astype(np.uint8)` or `np.uint8(an_array)`. – Whole Brain Jun 09 '21 at 17:28
0

I've figured out this piece of code:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Tue Jun  8 13:19:17 2021

@author: Pietro


https://stackoverflow.com/questions/67885596/how-numpy-ndarray-can-be-normalized

"""


import numpy as np

arrayz = np.array(np.random.randn(286,16,16,3), dtype=np.float32)

print(arrayz.shape)

print((arrayz.size))

print(arrayz[0,0,0,:],'            ',type(arrayz[0,0,0,:]))
print(arrayz[0,0,0,0],'            ',type(arrayz[0,0,0,0]))

print(np.min(arrayz),'     ',np.max(arrayz))


print(np.min(arrayz),'     ',np.max(arrayz))

arrayz_split = np.split(arrayz,286,0)

print(type(arrayz_split))

for i in arrayz_split:
    print(i.size,'  ', i.shape,'  ',  np.min(i),'   ', np.max(i))

arrayz_split_flat = []

for i in arrayz_split:
    ii = i[0]
    arrayz_split_flat.append(ii)
    
for i in arrayz_split_flat:
    print(type(i),'  ',i.size,'  ', i.shape,'  ',  np.min(i),'   ', np.max(i))
    
arrayz_split_flat_norm = []



for i in arrayz_split_flat:
      minz = np.min(i)
      manz = np.max(i)
      ii = ((i-minz)/(manz-minz)*255).astype(np.uint8)
      
      arrayz_split_flat_norm.append(ii)

for i in arrayz_split_flat_norm:
    
    print(type(i),'  ',i.size,'  ', i.shape,'  ',  np.min(i),'   ', np.max(i))

out_arr1 = np.stack((arrayz_split_flat_norm), axis = 0) 

print(type(out_arr1), out_arr1.size, '  ', out_arr1.shape, ' ',np.min(out_arr1),np.max(out_arr1), out_arr1[0,0,0,:],out_arr1[0,0,0,0])

I don't understand why:

arrayz = np.array(np.random.randn(286,16,16,3), dtype=np.float32)

seems to work while using:

arrayz1 = np.ndarray((286,16,16,3), dtype="float32")
arrayz = np.nan_to_num(arrayz1)

works but throwing an:

 RuntimeWarning: overflow encountered in float_scalars
  ii = ((i-minz)/(manz-minz)*255).astype(np.uint8)
RuntimeWarning: invalid value encountered in true_divide
  ii = ((i-minz)/(manz-minz)*255).astype(np.uint8)

and I end up whit a series of 16x16x3 arrays full of zeroes

pippo1980
  • 2,181
  • 3
  • 14
  • 30
  • As far as I understood, he wants the normalize every image itself, not over the whole data. So I think you need to move the min/max calculation into the loop. – chillking Jun 09 '21 at 08:13
  • yes sure thanks for pointing out, thats why I shoul always have test cases to check my scripts !! – pippo1980 Jun 09 '21 at 08:17
  • Yes I think so. – chillking Jun 09 '21 at 09:23
  • @pippo1980 When answering a question, please try to create the most efficient, simple and formatted piece of code. All the prints are useless; all the line spacing are useless; and they make your answer very hard to 1. understand; 2. appreciate for someone who does not need to run the code to understand what it does. – Mathieu Jun 09 '21 at 09:49
  • @pippo1980 Also all those prints you are using to get information like type, shape, .. are displayed in the variable explorer of your IDE directly. You do not need to print them. – Mathieu Jun 09 '21 at 09:51
  • @ pippo the code gives error: AttributeError: 'list' object has no attribute 'shape'. by the way, is it possible to normalize between [0-1]? – rayan Jun 09 '21 at 11:05
  • @rayan I’ll check it out. Np.stack should produce an array. Give some time – pippo1980 Jun 09 '21 at 12:32
  • @Mathieu I am sorry about that, but I'm learning too, I need the prints to keep track of what is going on – pippo1980 Jun 09 '21 at 15:56
  • @ryan the last array: 'out_arr1 ' gives 219648 (286, 16, 16, 3) 0 255 [ 61 119 87] 61 I can't find the error you are talking, could you post the entire trace ? to normalize 0 to 1 use just ii = ((i-minz)/(manz-minz)) – pippo1980 Jun 09 '21 at 16:16