Get coordinates of text of scanned image using Microsoft OCR

Question

Trying to find coordinates of text in scanned images. Scanned image has many text data need to convert that image data to text and then get the coordinates of text.Coordinates represents boundingboxes such as X,Y Axis,Height and Width where text is

Using Microsoft OCR ProjectOxford Vision

using Microsoft.ProjectOxford.Vision;
using Microsoft.ProjectOxford.Vision.Contract;
using System;
using System.Collections.Generic;
using System.IO;
using System.Threading.Tasks;

 namespace TextExtraction
 {
 class Program
 {  
    const string API_key = "<<Key>>";
    const string API_location = 
    "https://westcentralus.api.cognitive.microsoft.com/vision/v1.0"; 

    static void Main(string[] args)
    {
        string imgToAnalyze = @"C:\Users\abhis\Desktop\image.jpg";
        HandwritingExtraction(imgToAnalyze, false);

        Console.ReadLine();
    }

    public static void PrintResults(string[] res)
    {
        foreach (string r in res)
            Console.WriteLine(r);
        Console.ReadLine();
    }

    public static void HandwritingExtraction(string fname, bool wrds)
    {
        Task.Run(async () =>
        {
            string[] res = await HandwritingExtractionCore(fname, wrds);
            PrintResults(res);

        }).Wait();
    }

    public static async Task<string[]> HandwritingExtractionCore(string fname, bool wrds)
    {
        VisionServiceClient client = new VisionServiceClient(API_key, API_location);
        string[] textres = null;

        if (File.Exists(fname))
            using (Stream stream = File.OpenRead(fname))
            {
                HandwritingRecognitionOperation op = await 
         client.CreateHandwritingRecognitionOperationAsync(stream);
                HandwritingRecognitionOperationResult res = await 
         client.GetHandwritingRecognitionOperationResultAsync(op);

                textres = GetExtracted(res, wrds);
            }

        return textres;
    }

    public static string[] GetExtracted(HandwritingRecognitionOperationResult res, bool wrds)
    {
        List<string> items = new List<string>();

        foreach (HandwritingTextLine l in res.RecognitionResult.Lines)
            if (wrds)
                items.AddRange(GetWords(l));
            else
                items.Add(GetLineAsString(l));

        return items.ToArray();
    }

    public static List<string> GetWords(HandwritingTextLine line)
    {
        List<string> words = new List<string>();

        foreach (HandwritingTextWord w in line.Words)
            words.Add(w.Text);

        return words;
    }

    public static string GetLineAsString(HandwritingTextLine line)
    {
        List<string> words = GetWords(line);
        return words.Count > 0 ? string.Join(" ", words) : string.Empty;
    }
}
}

Expected Output : Get Text with their respective coordinates(x,y,height,width)

Input image

Json output

{ "status": "Succeeded", "succeeded": true, "failed": false, "finished": true, "recognitionResults": [ { "page": 1, "clockwiseOrientation": 359.62, "width": 505, "height": 399, "unit": "pixel", "lines": [ { "boundingBox": [ 224, 58, 380, 57, 381, 74, 225, 75 ], "text": "GOVERNMENT OF INDIA", "words": [ { "boundingBox": [ 229, 59, 321, 58, 320, 75, 229, 75 ], "text": "GOVERNMENT" }, { "boundingBox": [ 324, 58, 341, 58, 341, 75, 323, 75 ], "text": "OF" }, { "boundingBox": [ 344, 58, 381, 58, 381, 75, 344, 75 ], "text": "INDIA" } ] }, { "boundingBox": [ 211, 159, 429, 160, 428, 180, 210, 178 ], "text": "FH faPet/ DOB: 27/07/1982", "words": [ { "boundingBox": [ 225, 160, 243, 160, 243, 179, 225, 179 ], "text": "FH" }, { "boundingBox": [ 247, 160, 286, 160, 286, 179, 247, 179 ], "text": "faPet/" }, { "boundingBox": [ 290, 160, 333, 160, 333, 179, 290, 179 ], "text": "DOB:" }, { "boundingBox": [ 337, 160, 428, 162, 428, 180, 337, 179 ], "text": "27/07/1982" } ] }, { "boundingBox": [ 209, 192, 313, 190, 314, 208, 210, 210 ], "text": "you / MALE", "words": [ { "boundingBox": [ 214, 192, 247, 192, 246, 209, 214, 210 ], "text": "you" }, { "boundingBox": [ 254, 192, 260, 192, 260, 209, 254, 209 ], "text": "/" }, { "boundingBox": [ 264, 192, 314, 192, 313, 208, 263, 209 ], "text": "MALE" } ] }, { "boundingBox": [ 201, 314, 351, 313, 352, 330, 202, 331 ], "text": "66 66 6666 6666", "words": [ { "boundingBox": [ 204, 315, 225, 314, 225, 330, 204, 331 ], "text": "66" }, { "boundingBox": [ 229, 314, 251, 314, 251, 330, 229, 330 ], "text": "66" }, { "boundingBox": [ 255, 314, 301, 314, 301, 330, 255, 330 ], "text": "6666" }, { "boundingBox": [ 307, 314, 352, 314, 351, 331, 306, 330 ], "text": "6666" } ] } ] } ] }

Do not [repost](https://stackoverflow.com/questions/56412472/how-to-get-coordinates-of-text-from-scanned-image) a question! If you want to add to is you can always edit the question!! — TaW, Jun 02 '19 at 07:23

score 2 · Answer 1 · answered Jun 02 '19 at 10:16

2

I guess you are using Microsoft C# Azure app like thing. Here is the detailed link to your question.

https://learn.microsoft.com/en-us/azure/cognitive-services/computer-vision/quickstarts/csharp-print-text

inside the contentString. it is sth like.

"language": "en",
    "textAngle": -1.5000000000000335,
    "orientation": "Up",
    "regions": [
        {
            "boundingBox": "154,49,351,575",
            "lines": [
                {
                    "boundingBox": "165,49,340,117",
                    "words": [
                        {
                            "boundingBox": "165,49,63,109",
                            "text": "A"
                        },
                        {
                            "boundingBox": "261,50,244,116",
                            "text": "GOAL"
                        }
                    ]
                },
                {

I did some project with Azsure C#. But your code didn`t look very familiar.

I would suggest you to see all the data format inside textres or res (in your code) I think it contains the same reference as shown in the string above

answered Jun 02 '19 at 10:16

Dr Yuan Shenghai

1,849
1
6
19

Dr Yuan i saw that example but unable to get source code of Read both printed and handwritten text in images on this side `https://azure.microsoft.com/en-in/services/cognitive-services/computer-vision/` please let me know if you have any idea of source code of scanned image – Tony Jun 02 '19 at 14:31
In your code. see whats inside the textres or res – Dr Yuan Shenghai Jun 02 '19 at 14:50
Yuan `https://westcentralus.api.cognitive.microsoft.com/vision/v2.0/read/core/asyncBatchAnalyze` currently i am using this api can you please suggest how can i use this **boundingBox": [ 224, 58, 380, 57, 381, 74, 225, 75 ]** to create rectangle – Tony Jun 04 '19 at 11:37
1

@Tony I can not open your link. For plot bounding box, search line " overlay.SetBinding(Border.MarginProperty, wordBoxOverlay.CreateWordPositionBinding()); " in the microsoft OCR sample https://github.com/microsoft/Windows-universal-samples/blob/master/Samples/OCR/cs/OcrFileImage.xaml.cs there you should have some idea on how to use it – Dr Yuan Shenghai Jun 04 '19 at 13:12

score 2 · Answer 2 · answered Jun 02 '19 at 19:07

2

Firstly, note that there are two different APIs for text recognition in Microsoft Cognitive Services. Dr. Yuan's output is from the OCR API which has broader language coverage, whereas Tony's output shows that he's calling the newer and improved Read API.

Secondly, note that client SDK referenced in the code sample above, Microsoft.ProjectOxford.Vision, is deprecated, and you will want to switch to the replacement Microsoft.Azure.CognitiveServices.Vision.ComputerVision, the sample for which you will find here.

And finally, the answer to the specific question. The location of text recognized in your document is represented in the boundingBox field. So for your example output JSON, the line of text GOVERNMENT OF INDIA is bounded by the coordinates (224, 58), (380, 57), (381, 74) and (225, 75), representing the four corners. It is not in a x,y,width,height format to allow for rotation. Note that the units for the bounding box is also included in the JSON (in your case, pixels). The location of each word within the line are also in your response JSON if that's what you're after.

answered Jun 02 '19 at 19:07

cthrash

2,938
2
11
10

currently i am getting boundingBox": [ 224, 58, 380, 57, 381, 74, 225, 75 ] as response need to convert it to create rectangle(x,y,height,width) please suggest how can i achieve this – Tony Jun 04 '19 at 14:44
1

var xval = line.BoundingBox.Where((v,i) => (i&1) == 0); var yval = line.BoundingBox.Where((v,i) => (i&1) == 1); var x = xval.Min(); var y = yval.Min(); var width = xval.Max() - x; var height = yval.Max() - y; var rect = new Rectangle(x,y,width,height); – cthrash Jun 04 '19 at 16:10
Can you help me out to for this parameters boundingBox":[X top left,Y top left,X top right ,Y top right , X bottom right , Y bottom right ,X bottom left, Y bottom left ] since i have this points . What are v,i value ? – Tony Jun 04 '19 at 16:28
1

v,i are values auto-assigned in the Linq statement, v in this case is the value, i the index. – cthrash Jun 04 '19 at 16:34
i have boundingBox":[X top left,Y top left,X top right ,Y top right , X bottom right , Y bottom right ,X bottom left, Y bottom left ] value by using this how can i create rectangle box as i can see you have used line – Tony Jun 04 '19 at 16:43

Get coordinates of text of scanned image using Microsoft OCR

2 Answers2