5

I am using scikit-learn library to perform a supervised classification (Support Vector Machine classifier) on a satellite image. My main issue is how to train my SVM classifier. I have watched many videos on youtube and have read a few tutorials on how to train an SVM model in scikit-learn. All the tutorials I have watched, they used the famous Iris datasets. In order to perform a supervised SVM classification in scikit-learn we need to have labels. For Iris datasets we have the Iris.target which is the labels ('setosa', 'versicolor', 'virginica') we are trying to predict. The procedure of training is straightforward by reading the scikit-learn documentation.

In my case, I have to train a SAR satellite image captured over an urban area and I need to classify the urban area, roads, river and vegetation (4 classes). This image has two bands but I do not have label data for each class I am trying to predict such as the Iris data.

So, my question is, do I have to manually create vector data (for the 4 classes) in order to train the SVM model? Is there an easier way to train the model than manually creating vector data? What do we do in this case?

I am bit confused to be honest. I would appreciate any help

Tonechas
  • 13,398
  • 16
  • 46
  • 80
Johny
  • 319
  • 3
  • 12
  • 1
    I'm not sure I understand your question. If you do not have labelled data, you can't use a supervised learning technique... but maybe I am not understanding something about satellite image data... – juanpa.arrivillaga Apr 10 '17 at 19:31
  • Hi juanpa.arrivillaga,Thanks for your answer. So, I have to create training data manually for my satellite image I suppose. The training process confesses me a little bit – Johny Apr 10 '17 at 19:34
  • 1
    One possible approach is to use openstreetmaps.org to generate test data to train your model, since you likely have coordinates for your imagery. The difficulty will be in parsing OSM data into the categories you need, but the format is well documented and there are libraries to help you. – Yacine Filali Apr 10 '17 at 19:58
  • Thank's for your answer. – Johny Apr 10 '17 at 20:05

2 Answers2

19

Here's a complete example that should get you on the right track. For the sake of simplicity, let us assume that your goal is that of classifying the pixels on the three-band image below into three different categories, namely building, vegetation and water. Those categories will be displayed in red, green and blue color, respectively.

New York

We start off by reading the image and defining some variables that will be used later on.

import numpy as np
from skimage import io

img = io.imread('https://i.stack.imgur.com/TFOv7.png')

rows, cols, bands = img.shape
classes = {'building': 0, 'vegetation': 1, 'water': 2}
n_classes = len(classes)
palette = np.uint8([[255, 0, 0], [0, 255, 0], [0, 0, 255]])

Unsupervised classification

If you don't wish to manually label some pixels then you need to detect the underlying structure of your data, i.e. you have to split the image pixels into n_classes partitions, for example through k-means clustering:

from sklearn.cluster import KMeans

X = img.reshape(rows*cols, bands)
kmeans = KMeans(n_clusters=n_classes, random_state=3).fit(X)
unsupervised = kmeans.labels_.reshape(rows, cols)

io.imshow(palette[unsupervised])

unsupervised classification

Supervised classification

Alternatively, you could assign labels to some pixels of known class (the set of labeled pixels is usually referred to as ground truth). In this toy example the ground truth is made up of three hardcoded square regions of 20×20 pixels shown in the following figure:

ground truth

supervised = n_classes*np.ones(shape=(rows, cols), dtype=np.int)

supervised[200:220, 150:170] = classes['building']
supervised[40:60, 40:60] = classes['vegetation']
supervised[100:120, 200:220] = classes['water']

The pixels of the ground truth (training set) are used to fit a support vector machine.

y = supervised.ravel()
train = np.flatnonzero(supervised < n_classes)
test = np.flatnonzero(supervised == n_classes)

from sklearn.svm import SVC

clf = SVC(gamma='auto')
clf.fit(X[train], y[train])
y[test] = clf.predict(X[test])
supervised = y.reshape(rows, cols)

io.imshow(palette[supervised])

After the training stage, the classifier assigns class labels to the remaining pixels (test set). The classification results look like this:

supervised classification

Final remarks

Results seem to suggest that unsupervised classification is more accurate than its supervised counterpart. However, supervised classification generally outperforms unsupervised classification. It is important to note that in the analyzed example accuracy could be dramatically improved by adjusting the parameters of the SVM classifier. Further improvement could be achieved by enlarging and refining the ground truth, since the train/test ratio is very small and the red and green patches actually contain pixels of different classes. Finally, one can reasonably expect that utilizing more sophisticated features such as ratios or indices computed from the intensity levels (for instance NDVI) would boost performance.

Tonechas
  • 13,398
  • 16
  • 46
  • 80
  • 2
    This was a very complete complete and interesting answer to my problem. You put me on the right track. Thank you for your answer, I appreciate it. – Johny Apr 21 '17 at 09:58
  • Very direct and complete example for new comers that are lost with thousands of references on the web – GCGM Oct 30 '19 at 16:59
1

My Solution:-

Manual Processing:-

If the size of your dataset is small, you can manually create a vector data (also reliable, when it is created by yourself). If not, it is much difficult to apply SVM to classify the images.

Automatic Processing:-

Step 1:-

You can use "Unsupervised Image Clustering" technique to group your images into those 4 categories, then label the images from 1 to 4 after clustering is done. (eg. K-Means Clustering Algorithm)

Step 2:-

Currently, you are having a dataset of labeled images. Split them to train-test data.

Step 3:-

Now apply SVM to classify your test images and find out your model accuracy.

  • I don't see the point of classifying pixels through k-means clustering and then applying an SVM classifier on the labeled pixels. There are situations in which combining supervised and unsupervised learning can be beneficial (refer [this thread](https://stats.stackexchange.com/questions/178535/mixing-unsupervised-and-supervised-learning) for details) but - as I see it - the _automatic processing_ approach you suggest does not make sense. – Tonechas Apr 20 '17 at 16:53
  • Thanks for pointing me out this problem. But i just said k-means as an instance. The real thing is, we can group different images using any better unsupervised learning algorithms and then it is possible to label them based on the group the image is clustered. After this process, every single image has its own class labels, so we can apply supervised learning algorithms for classification process. If you still have any query, please look at the post next to this reply.. – Karthik Sekaran Apr 21 '17 at 04:35
  • If you are able to correctly label the pixels through whatever unsupervised learning algorithm then you don't need to reclassify the pixels through supervised learning. By doing so, the outcome of the supervised classification is likely to be less accurate than that of its unsupervised counterpart. BTW I'm the author of the post that you refer me to :-) – Tonechas Apr 21 '17 at 11:54
  • Ha Ha.. I never looked into the author's name. Whatever you said is correct without any doubt. But the way the question asked is about to "classify" the images. As you said, for this scenario unsupervised learning is perfectly suits. My intention is not to give perfect solution to the question, but if possible i may. That's why i was given my answer like this. If you felt that the answer is very poor, you can obviously hit the down vote button. – Karthik Sekaran Apr 21 '17 at 14:29