Version 1 of the Google Cloud Vision API (beta) permits optical character recognition via TEXT_DETECTION requests. While recognition quality is good, characters are returned without any hint of the original layout. Structured text (e.g., tables, receipts, columnar data) are therefore sometimes incorrectly ordered.
Is it possible to preserve document structure with the Google Cloud Vision API? Similar questions have been asked of tesseract and hOCR. For example, [1] and [2]. There is currently no information about TEXT_DETECTION options in the documentation [3].
[1] How to preserve document structure in tesseract [2] Tesseract - ambiguity in space and tab [3] https://cloud.google.com/vision/