As far as I know, pre-trained models play well in many tasks as a feature-extractor, thanks to their abundant training dataset.
However, I'm wondering that whether the model, let's say vgg-16
,
have certain ability to extract some "semantic" information from input image?
If the answer is positive, given an unlabeled dataset
,
is it possible to "cluster" images by measuring the semantic similarities of the extracted features?
Actually, I've spent some efforts:
- Load pre-trained vgg-16 through Pytorch.
- Load Cifar-10 dataset and transform to batched-tensor
X
, of size(5000, 3, 224, 224). - Fine-tune vgg.classifier, define its output dimension as 4096.
- Extract features:
features = vgg.features(X).view(X.shape[0], -1) # X: (5000, 3, 224, 224)
features = vgg.classifier(features) # features: (5000, 25088)
return features # features: (5000, 4096)
- Try out
cosine similarity
,inner product
,torch.cdist
, however, only to find several bad clusters.
Any suggestion? Thanks in advance.