34

I am currently trying to make a program to differentiate rotten oranges and edible oranges solely based on their external appearance. To do this, I am planning on using a Convolutional Neural Network to train with rotten oranges and normal oranges. After some searching I could only find one database of approx. 150 rotten oranges and 150 normal oranges on a black background (http://www.cofilab.com/downloads/). Obviously, a machine learning model will need at least few thousand oranges to achieve an accuracy above 90 or so percent. However, can I alter these 150 oranges in some way to produce more photos of oranges? By alter, I mean adding different shades of orange on the citrus fruit to make a "different orange." Would this be an effective method of training a neural network?

Rehaan Ahmad
  • 794
  • 8
  • 23

4 Answers4

10

It is a very good way to increase the number of date you have. What you'll do depends on your data. For example, if you are training on data obtained from a sensor, you may want to add some noise to the training data so that you can increase your dataset. After all, you can expect some noise coming from the sensor later on.

Assuming that you will train it on images, here is a very good github repository that provides means to use those techniques. This python library helps you with augmenting images for your machine learning projects. It converts a set of input images into a new, much larger set of slightly altered images. Link: https://github.com/aleju/imgaug

Features:

  • Most standard augmentation techniques available.

  • Techniques can be applied to both images and keypoints/landmarks on images. Define your augmentation sequence once at the start of the experiment, then apply it many times.

  • Define flexible stochastic ranges for each augmentation, e.g. "rotate each image by a value between -45 and 45 degrees" or "rotate each image by a value sampled from the normal distribution N(0, 5.0)".

  • Easily convert all stochastic ranges to deterministic values to augment different batches of images in the exactly identical way (e.g. images and their heatmaps).

enter image description here

Dellein
  • 343
  • 2
  • 11
3

Data augmentation is what you are looking for. In you case you can do different things:

  1. Apply filters to get slightly different image, as has been said you can use gaussian blur.

  2. Cut the orange and put it in different backgrounds.

  3. Scale the oranges with different scales factors.

  4. Rotate the images.

  5. create synthetic rotten oranges.

  6. Mix all different combinations of the previous mentioned. With this kind of augmentation you can easily create thousand of different oranges.

I did something like that with a dataset of 12.000 images and I can create 630.000 samples

2

That is indeed a good way to increase your data set. You can, for example, apply Gaussian blur to the images. They will become blurred, but different from the original. You can invert the images too. Or, in last case, look for new images and apply the cited techniques.

Lucas Borsatto
  • 183
  • 1
  • 10
0

Data augmentation is really good way to boost training set but still not enough to train a deep network end to end on its own given the possibility that it will overfit. You should look at domain adaptation where you take a pretrained model like inception which is trained on imagenet dataset and finetune it for your problem. Since you have to learn only parameters required to classify your use case, it is possible to achieve good accuracies with relatively less training data available. I have hosted a demo of classification with this technique here. Try it out with your dataset and see if it helps. The demo takes care of pretrained model as well as data augmentation for dataset that you will upload.

pratsJ
  • 3,369
  • 3
  • 22
  • 41