Is it possible for either Microsoft Computer Vision API or Google's Cloud Vision API to get a location for objects?

Question

I am trying to develop an application that needs to know the location of tagged objects in an image. Knowing that there is a "piano" in an image is not enough, I need to know where that piano is in the image.

Both Microsoft's Computer Vision API and Google's Cloud Vision API provide some form of cropping suggestion/smart thumbnail generation service which leads me to think that the location of certain objects is being detected - however is there a way to get that information (like a bounding box around each detected object) from either Microsoft's Computer Vision API or Google's Cloud Vision API?

EDIT: I understand that both APIs can return the location of faces detected in an image, however I am looking for locations and sizes of every object in an image: cars, pianos, trees, people...anything.

Possible duplicate of [How to get a position of custom object on image using vision recognition api](http://stackoverflow.com/questions/38634409/how-to-get-a-position-of-custom-object-on-image-using-vision-recognition-api) — Nakilon, Dec 26 '16 at 05:33

score 0 · Answer 1 · answered Dec 23 '16 at 21:06

0

Microsoft Vision API offer no pixel coordinates for the detected objects (see return features: https://dev.projectoxford.ai/docs/services/56f91f2d778daf23d8ec6739/operations/56f91f2e778daf14a499e1fa).

However if you want to detect persons Microsoft API can return the coordinates of the face rectangles.

answered Dec 23 '16 at 21:06

DaveStat

56
6

See my edit - I'm looking for more than just face locations, but I understand that these APIs may not be what I am looking for. – abagshaw Dec 23 '16 at 21:09
In that case Microsoft API is not suitable – DaveStat Dec 23 '16 at 21:14
Any idea about the Google API or any other APIs? – abagshaw Dec 23 '16 at 21:15
Have you tried using the OpenCV package in python (tutorial: https://www.intorobotics.com/how-to-detect-and-track-object-with-opencv/ ). Unfortunately I have no clue about googles API. Good luck. – DaveStat Dec 23 '16 at 21:18
1

I think OpenCV has to be trained to be able to classify a ton of objects. I am looking for some solution that already can recognize thousands of every day objects and items. – abagshaw Dec 23 '16 at 21:27

score 0 · Answer 2 · answered Apr 25 '17 at 10:59

I don't know about any API serving you coordinates of the object at this time. What I recommend to use is YOLO which provides you with coordinates of the object. You can use either pre-trained models or train your own.

However, it is not API and you have to code a bit of backend to run in remotely.

score 0 · Answer 3 · answered Oct 11 '19 at 12:00

Hope this could help you https://azure.microsoft.com/en-in/services/cognitive-services/computer-vision/

API:

url:- (In POST) https://{yourvisionapp}.cognitiveservices.azure.com/vision/v2.0/detect
headers:- Content-Type: application/json
Ocp-Apim-Subscription-Key : {yourSubscriptionKey}
body:- {"url":"yoururl"}

sample response:-

{
    "objects": [
        {
            "rectangle": {
                "x": 460,
                "y": 79,
                "w": 141,
                "h": 258
            },
            "object": "window",
            "confidence": 0.508
        },
        {
            "rectangle": {
                "x": 180,
                "y": 240,
                "w": 299,
                "h": 182
            },
            "object": "Billiard table",
            "confidence": 0.635,
            "parent": {
                "object": "table",
                "confidence": 0.676
            }
        },
        {
            "rectangle": {
                "x": 8,
                "y": 11,
                "w": 497,
                "h": 416
            },
            "object": "room",
            "confidence": 0.547
        }
    ],
    "requestId": "f8aafd95-d17d-4088-a34b-ad616f9cde4a",
    "metadata": {
        "width": 640,
        "height": 427,
        "format": "Jpeg"
    }
}

score 0 · Answer 4 · answered Mar 05 '20 at 04:28

2020 UPDATE:

This question is a few years old, but the Microsoft Azure Computer Vision API is now able to draw bounding boxes around objects that are detected in an image. Here is a sample in Python. Other languages are available as well.

Computer Vision documentation: https://learn.microsoft.com/en-us/azure/cognitive-services/computer-vision/

Computer Vision SDK: https://learn.microsoft.com/en-us/python/api/azure-cognitiveservices-vision-computervision/?view=azure-python

Computer Vision API: https://westus.dev.cognitive.microsoft.com/docs/services/5cd27ec07268f6c679a3e641/operations/56f91f2e778daf14a499f21b