1

While trying to build a letter classifier in ML, this was a code for creating image data and the labels from the images from a folder using PIL.

def create_dataset_PIL(img_folder):

img_data_array=[]
class_name=[]
for dir1 in os.listdir(img_folder):
    print(dir1)
    for file in os.listdir(os.path.join(img_folder, dir1)):       
        image_path= os.path.join(img_folder, dir1,  file)
        image= np.array(Image.open(image_path))
        image= np.resize(image,(IMG_HEIGHT,IMG_WIDTH,3))
        image = image.astype('float32')
        image /= 255  
        img_data_array.append(image)
        class_name.append(dir1)
return img_data_array , class_name

Each image is 32 X 32 pixels in the dataset already and I am resizing it to a list of 32 X 32 X 3 dimension. But I don't understand, what is this 3rd dimension when all I need is 32 X 32 pixels?

I stumbled upon Numpy Resize/Rescale Image where I learned this may be interpolation parameter. Also from YouTube, I learned that interpolation is required while resizing images. But I don't know what to do with this extra data? Should size of input layer of my Neural Network be now 32 X 32 X 3 instead of just 32 X 32?

Tanishka
  • 401
  • 1
  • 6
  • 22

2 Answers2

2

3 represent the RGB (RED-GREEN-BLUE) values. Each pixel of the image represented by 3 pixels instead of one. In a black&white image, each pixel would be represented by [pixel], In RGB image each pixel would be represented by [pixel(R),pixel(G),pixel(B)]

In fact, each pixel of the image has 3 RGB values. These range between 0 and 255 and represent the intensity of Red, Green, and Blue. A lower value stands for higher intensity and a higher value for lower intensity. For instance, one pixel can be represented as a list of these three values [ 78, 136, 60]. Black would represented as [0, 0, 0].

And yes: Your input layer should match this 32X32X3.

Niv Dudovitch
  • 1,614
  • 7
  • 15
  • Thank you for your answer. I have a doubt, if my model is just a text classifier, should I not convert my RGB image to grayscale image? That would mean less complexity for the NN? – Rishabh Kumar Singh Sep 05 '21 at 16:23
  • 1
    Gray scale is simply reducing complexity: from a 3D pixel value (R,G,B) to a 1D value. Many tasks do not fare better with 3D pixels (eg. edge detection). But, there is a chance that downscale the number of dimensions will lose important information. In the end, you can try both, and find who works better for you (accurate/runtime..) – Niv Dudovitch Sep 05 '21 at 16:42
  • Thank you. I will do the same. – Rishabh Kumar Singh Sep 06 '21 at 05:09
2

3'rd dimension in Digital image contains information about color present on pixel at (x,y)coordinate in the image, also called as color channel.

Most common channel types

  • RGB mode: if value is 3
    for example: image_shape: [32,32,3]
  • Gray scale mode: if value is 1 for example: image_shape: [32,32,1]

If your ML model doesn't need colour feature you can use Scikit-image to convert into grayscale through rgb2gray

you can learn more about image usage in NumPy here

  • Thank you for your answer. Please let me know if I convert my RGB image to grayscale image, will it be bad for an image classifier model? – Rishabh Kumar Singh Sep 05 '21 at 16:25
  • 1
    when it comes to machine learning it's always iterative experimentation we start with base parameters, evaluate its performance and then tweak the parameters based on our educated guess to check if our model performs better than our base model. the cycle continues until reasonable performance is achieved. here train the model with RGB images and grayscale images separately, then choose the right parameter, based on better model performance(accuracy in prediction) among them – prakash sellathurai Sep 05 '21 at 18:59
  • Right. Thank you. – Rishabh Kumar Singh Sep 06 '21 at 05:10