Create labeled image dataset for machine learning models

Question

My question is about how to create a labeled image dataset for machine learning?

I have always worked with already available datasets, so I am facing difficulties with how to labeled image dataset(Like we do in the cat vs dog classification).

I have to do labeling as well as image segmentation, after searching on the internet, I found some manual labeling tools such as LabelMe and LabelBox.LabelMe is good but it's returning output in the form of XML files.

Now again my concern is how to feed XML files into the neural network? I am not at all good at image processing task, so I need an alternative suggestion.

Edit: I have scanned copy of degree certificates and normal documents, I have to make a classifier which will classify degree certificates as 1 and non-degree certificates as 0. So my label would be like:
Degree_certificate -> y(1)
Non_degree_cert -> y(0)

score 1 · Answer 1 · answered Oct 17 '18 at 07:03

You don't feed XML files to the neural network. You process them with an XML parser, and use that to extract the label. See the question How do I parse XML in Python? for advice on how this works.

Image data sets can come in a variety of starting states. Sometimes, for instance, images are in folders which represent their class. If you like to work with this approach, then rather than read the XML file directly every time you train, use it to create a data set in the form that you like or are used to. The reason you find many nice ready-prepared data sets online is because other people have done exactly this. It is worth doing, as you don't then need to repeat all the transformations from raw data just to start training a model.

For example, collect your XML data from LabelMe, then use a short script to read the XML file, extract the label you entered previously using ElementTree, and copy the image to a correct folder. You will end up with a data set consisting of two folders with positive and negative matching images, ready to process with your favourite CNN image-processing package.

Thank you so much for the suggestion, I will surely try it. One more question is where and how to extract the label using ElementTree. — dolly vaishnav, Oct 17 '18 at 07:20
@dollyvaishnav: I have not used LabelMe, so I don't know. You will need to inspect the XML it produces, maybe in a text editor, and learn just enough XML to understand what it is you are looking at. The LabelMe documentation may explain more. — Neil Slater, Oct 17 '18 at 07:25
I haven't done much in bulk. But for a classification task, I would just sort the images into folders directly, then review them. Fine for < 1000 images. — Neil Slater, Oct 17 '18 at 07:52

Create labeled image dataset for machine learning models

1 Answers1