0

Is there a way to group the text response of Google's Document Text Detection API by block? I may have overlooked it in the documentation if ever there is a provided solution. I am currently using node.js to get the text from the image provided by the user. Here is my code:

const vision = require('@google-cloud/vision');
const client = new vision.ImageAnnotatorClient({
  keyFilename: 'APIKey.json'
});
client
  .documentTextDetection('image.jpg')
  .then(results => {
    res.send(results);
  })
  .catch(err => {
    res.send(err);
  });

Thanks in advance.

NinoNextix
  • 47
  • 7

1 Answers1

2

I'm not sure if there is a standardized way to do this, but the Vision API does give us everything we need to compose the block text, including relevant breaks (see Vision API break Types). So we can enumerate each block and create the text from it.

There are a couple of other break types I'm not accounting for (HYPHEN, SURE_SPACE), but I think it should be easy to add these.

For example:

const vision = require('@google-cloud/vision');
const client = new vision.ImageAnnotatorClient({
    keyFilename: 'APIKey.json'
});

client
.documentTextDetection('image.jpg')
.then(results => {
    console.log("Text blocks: ", getTextBlocks(results));
})
.catch(err => {
    console.error("An error occurred: ", err);
});

function getTextBlocks(visionResults) {
    let textBlocks = [];
    let blockIndex = 0;;
    visionResults.forEach(result => {
        result.fullTextAnnotation.pages.forEach(page => {
            textBlocks = textBlocks.concat(page.blocks.map(block => { return { blockIndex: blockIndex++, text: getBlockText(block) }}));
        });
    });
    return textBlocks;
}

function getBlockText(block) {
    let result = '';
    block.paragraphs.forEach(paragraph => {
        paragraph.words.forEach(word => {
            word.symbols.forEach(symbol => {
                result += symbol.text;
                if (symbol.property && symbol.property.detectedBreak) {
                    const breakType = symbol.property.detectedBreak.type;
                    if (['EOL_SURE_SPACE' ,'SPACE'].includes(breakType)) {
                        result += " ";
                    }
                    if (['EOL_SURE_SPACE' ,'LINE_BREAK'].includes(breakType)) {
                        result += "\n"; // Perhaps use os.EOL for correctness.
                    }
                }
            })
        })
    })

    return result;
}
Terry Lennox
  • 29,471
  • 5
  • 28
  • 40
  • Thanks! But I am getting an error message of the following: TypeError: Cannot read property 'detectedBreak' of null – NinoNextix Jul 18 '19 at 01:56
  • Apparently, symbol.property can also be null, so: if(symbol.property) before anything else. Thanks! – NinoNextix Jul 18 '19 at 02:48
  • @TerryLennox can you do this in php please? –  Sep 06 '19 at 07:29
  • Sure thing @overflow-stack.. it'll be slightly different... could you ask a question, I think it would be better to create a new question for this? – Terry Lennox Sep 06 '19 at 09:48
  • @TerryLennox can you try solve this? https://stackoverflow.com/questions/57817740/how-can-i-detect-all-the-text-that-inside-a-block-with-google-vision-api/57818828#57818828 –  Sep 06 '19 at 16:01
  • @TerryLennox, good day to you, have you viewed the question yet? –  Sep 09 '19 at 09:08