-2

I've finished the deep learning course on Kaggle learn, and I have started to write a model for the MNIST Digit dataset. I like to understand the code that I learn, and I have come across this bit:

def data_prep(raw):
    out_y = keras.utils.to_categorical(raw.label, num_classes)

    num_images = raw.shape[0]
    x_as_array = raw.values[:,1:]
    x_shaped_array = x_as_array.reshape(num_images, img_rows, img_cols, 1)
    out_x = x_shaped_array / 255
    return out_x, out_y

This part really confuses me. I don't understand most of it. Could somebody explain this step-by-step on what every line of code does? And if I were to do this on a colored image with multiple colors, how would this work? I know this is a bit broad. Later on, I'm going to do something that involves colored images, but I'm not sure how I would do it, because I can see the black and white 'parameters' (the 1 in the shaping of the array, division by 255)

Sidenote: raw is a pandas dataframe

pppery
  • 3,731
  • 22
  • 33
  • 46
Mint Studios
  • 122
  • 8
  • 1
    This question is far too broad. [Here is a great guide to debugging](https://stackoverflow.com/questions/4929251/how-to-step-through-python-code-to-help-debug-issues) that should give you a place to start. Have you tried checking the output of each variable after each step to make sense of it? That would be a great first step to take – G. Anderson Oct 02 '20 at 19:53

1 Answers1

1

Adding comments above each line to explain its purpose:

#input is a 2D dataframe of images
def data_prep(raw):
    #convert the classes in raw to a binary matrix
    #also known as one hot encoding and is typically done in ML
    out_y = keras.utils.to_categorical(raw.label, num_classes)

    #first dimension of raw is the number of images; each row in the df represents an image
    num_images = raw.shape[0]

    #remove the first column in each row which is likely a header and convert the rest into an array of values
    #ML algorithms usually do not take in a pandas dataframe 
    x_as_array = raw.values[:,1:]

    #reshape the images into 3 dimensional
    #1st dim: number of images
    #2nd dim: height of each image (i.e. rows when represented as an array)
    #3rd dim: width of each image (i.e. columns when represented as an array)
    #4th dim: the number of pixels which is 3 (RGB) for colored images and 1 for gray-scale images
    x_shaped_array = x_as_array.reshape(num_images, img_rows, img_cols, 1)

    #this normalizes (i.e. 0-1) the image pixels since they range from 1-255. 
    out_x = x_shaped_array / 255

    return out_x, out_y

To deal with color images, your 4th dimension in the array should be of size 3 representing RGB values. Check out this tutorial for more in-depth information on CNNs and their inputs.

Akanksha Atrey
  • 780
  • 4
  • 8