I am exploring the idea of the ensemble technique for the semantic segmentation model. I initially wanted to use a support vector machine combined with UNet/ResNet/DeepLabV3 for the last layer. I found that I could use 'hinge loss' as a loss function and it works the same as a support vector machine. But I'm a bit not clear how this will work and if this is the right way I'm trying to implement it. Since I do not have extensive knowledge in computer vision I might be completely wrong and digging the wrong hole. If that's the case please let me know.
What I was thinking is the outcome of Unet layer will be imagesize*imagesize with probabilities of class for either 0 or 1(before applying activation) and I wanted to apply a support vector machine with this all pixel vectors and then classify with the result.
my question will be
- I feel like using sigmoid as activation of last layer and selecting hinge loss for loss function is a little different to what I wanted to do. Is this correct?
- Can I use hinge loss for semantic segmentation(binary class) as well?
- If the aforementioned methods are all wrong and If I wanted to implement the ensemble technique(semantic segmentation model+SVM) how can I implement?