Okay, I'm going to try to answer this as well as I can, but producing and pre-processing data for use in ML algorithms is laborious and often expensive (hence the repeated use of well known data sets for testing algorithm designs).
To address a few straight-forward questions first:
should I run a procedure which crop the JPG to extract only the vehicle portion?
No. This isn't necessary. The neural network will sort the relevant information in the images from the irrelevant itself and having a diverse set of images will help to build a robust classifier. Also you would likely make life a lot more difficult for yourself later on by resizing images (see point 1. below for more).
How could I do that using TensorFlow?
You wouldn't. Tensorflow is designed to build and test ML models, and does not have tool for pre-processing data. (well perhaps TensorFlow Extended does, but this shouldn't be necessary)
Now a rough guideline for how you would go about creating a data set from the files described:
1) The first thing you will need to do is to load your .jpg images into python and resize them all to be identical. A neural network will need the same number of inputs (pixels in this case) in every training example, so having different sized images will not work.
- There is a good answer detailing how to load images using python image library (PIL) on stack overflow here.
- The PIL image instances (elements of the list
loadedImages
in the example above) can then be converted to numpy arrays using data = np.asarray(image)
, which tensorflow can work with.
In addition to building a set of numpy arrays of your data, you will also need a second numpy array of labels for this data. A typical way to encode this will be as a numpy array the same length as your number of images with an integer value for each point representing the class to which that image belongs (0-8 for your 9 classes). You could input these by hand, but this will be labour intensive, and I would suggest using python strings inbuilt find method to locate key words within the filenames to automate determining their class. This could be done within the
for image in imagesList:
loop in the above link, as image
should be a string containing the image filename.
- As I mentioned above, resizing the images is necessary to make sure they are all identical. You could do this with numpy, using indexing to choose a subsection of each image array, or using PIL's resize function before converting to numpy. There is no right answer here, and many methods have been used to resize images for this purpose, from padding, to stretching to cropping.
Then end result here should be 2 numpy arrays. One of image data which has shape [w,h,3,n]
where w
=image width, h
=image height, 3 = the three RGB layers (provided images are in colour) and n
= the number of images you have. The second of labels associated with these images, of shape [n,]
where every element of the length n
array is an integer from 0-8 specifying its class.
At this point it would be a good idea to save the dataset in this format using numpy.save() so that you don't have to go through this process again.
2) Once you have your images in this format, tensorflow has a class called tf.Dataset into which you can load the image and label data described above and will allow you to shuffle and sample data from it.
I hope that was helpful, and I am sorry that there is no quick-fix solution to this (at least not one I am aware of). Good luck.