Google Vision OCR: DOCUMENT_TEXT_DETECTION produces strange results when TEXT_DETECTION is fine

Question

I'm playing around within the quick start guide: https://cloud.google.com/vision/docs/quickstart and I noticed there were wildly different results when using the same image for DOCUMENT_TEXT_DETECTION vs TEXT_DETECTION.

For reference, this is the image I'm using (plug this in for imageUri): https://storage.googleapis.com/random-resources/receipt.jpg

When using TEXT_DETECTION, the description seems to give a good summary of the image but when I use DOCUMENT_TEXT_DETECTION, the result is a bunch of text that is found nowhere on the image:

"OLZ-E\niNO N WHL\nL8' G7 WY NINId\n9E'V\nDG'S\n78' 8\nSD 177\nXel [ 101\n3L VON VW IS\nXVI\n11/ans\nas\na new set \"\nHe same time in more\nS8' 9p\nS8' 9p\n98' 9p\nGD' Ot\nGD'Or\nIIIA AHNIH\nIIIA ANNIH !\nIIIA AHNIH L\nLNI ALII I\nLNI ALII !\nement on to\nSee more money were more women\none more time we came as we are on memo sense we need some more money when we see some moment as a team\nwdE:6\nI Ssang\n200E YOay)\nLL |602 SOL Jed H3NNIS\nNNS\nEEZ a qel\nsame was one or more was a\nmoment\nto earn and a time when\nwe are seen\n909t-G88-9\nOIL 6 PD 'OISION VX: NVS\nINNIZAV SSN NVA 906\nBIY SWINd\nO\nSNOH\n"

Any ideas?

As of 2019 I can say that my test regarding TEXT_DETECTION vs DOCUMENT_TEXT_DETECTION for receipts similiar to yours, both OCR types work quite well. But the TEXT_DETECTION gives me about 10% better results overall. I think the reason is, the DOCUMENT_TEXT_DETECTION expects well structured documents which are upright and mostly undistorted. — Arno, Nov 14 '19 at 11:18

Google Vision OCR: DOCUMENT_TEXT_DETECTION produces strange results when TEXT_DETECTION is fine

0 Answers0