I know there is a lot of vision recognition APIs such as Clarifai, Watson, Google Cloud Vision, Microsoft Cognitive Services which provide recognition of image content. The response of these services is simple json that contains different tags, for example
{
man: 0.9969295263290405,
portrait: 0.9949591159820557,
face: 0.9261120557785034
}
The problem is that I need to know not only what is on the image but also the position of that object. Some of those APIs have such feature but only for face detection.
So does anyone know if there is such API or I need to train own haar cascades on OpenCV for every object.
I will be very greatful for sharing some info.