Google Vision API Text Detection Display Words by Block

Question

Is there a way to group the text response of Google's Document Text Detection API by block? I may have overlooked it in the documentation if ever there is a provided solution. I am currently using node.js to get the text from the image provided by the user. Here is my code:

const vision = require('@google-cloud/vision');
const client = new vision.ImageAnnotatorClient({
  keyFilename: 'APIKey.json'
});
client
  .documentTextDetection('image.jpg')
  .then(results => {
    res.send(results);
  })
  .catch(err => {
    res.send(err);
  });

Thanks in advance.

Terry Lennox · Accepted Answer · 2019-07-18T07:04:29.533

I'm not sure if there is a standardized way to do this, but the Vision API does give us everything we need to compose the block text, including relevant breaks (see Vision API break Types). So we can enumerate each block and create the text from it.

There are a couple of other break types I'm not accounting for (HYPHEN, SURE_SPACE), but I think it should be easy to add these.

For example:

const vision = require('@google-cloud/vision');
const client = new vision.ImageAnnotatorClient({
    keyFilename: 'APIKey.json'
});

client
.documentTextDetection('image.jpg')
.then(results => {
    console.log("Text blocks: ", getTextBlocks(results));
})
.catch(err => {
    console.error("An error occurred: ", err);
});

function getTextBlocks(visionResults) {
    let textBlocks = [];
    let blockIndex = 0;;
    visionResults.forEach(result => {
        result.fullTextAnnotation.pages.forEach(page => {
            textBlocks = textBlocks.concat(page.blocks.map(block => { return { blockIndex: blockIndex++, text: getBlockText(block) }}));
        });
    });
    return textBlocks;
}

function getBlockText(block) {
    let result = '';
    block.paragraphs.forEach(paragraph => {
        paragraph.words.forEach(word => {
            word.symbols.forEach(symbol => {
                result += symbol.text;
                if (symbol.property && symbol.property.detectedBreak) {
                    const breakType = symbol.property.detectedBreak.type;
                    if (['EOL_SURE_SPACE' ,'SPACE'].includes(breakType)) {
                        result += " ";
                    }
                    if (['EOL_SURE_SPACE' ,'LINE_BREAK'].includes(breakType)) {
                        result += "\n"; // Perhaps use os.EOL for correctness.
                    }
                }
            })
        })
    })

    return result;
}

Thanks! But I am getting an error message of the following: TypeError: Cannot read property 'detectedBreak' of null — NinoNextix, Jul 18 '19 at 01:56
Apparently, symbol.property can also be null, so: if(symbol.property) before anything else. Thanks! — NinoNextix, Jul 18 '19 at 02:48
Sure thing @overflow-stack.. it'll be slightly different... could you ask a question, I think it would be better to create a new question for this? — Terry Lennox, Sep 06 '19 at 09:48
@TerryLennox can you try solve this? https://stackoverflow.com/questions/57817740/how-can-i-detect-all-the-text-that-inside-a-block-with-google-vision-api/57818828#57818828 — , Sep 06 '19 at 16:01
@TerryLennox, good day to you, have you viewed the question yet? — , Sep 09 '19 at 09:08

Google Vision API Text Detection Display Words by Block

1 Answers1