OCR diagonally written text

Question

After a broad search of keywords in google scholar, images, and web - I cannot find anything related to OCR of diagonal text. There are a few close examples:

The page related to open CV preprocessing a document for skew it is close, but relates to the entire page
This document has an example of no skew, with a mix of horizontal and diagonal text, but the question there does not relate to the diagonal text, though this is a good example

So, presumably, diagonal fields functions do not exist in openCV. Is this true. And how are diagonal text fields handled?

score 0 · Accepted Answer · answered Dec 16 '13 at 04:19

It seems you want to perform OCR on a page with both horizontal and diagonal text. There is no straightforward solution in terms of OpenCV, but you could take a divide-and-conquer approach such as:

Partition the image according to prior knowledge about the document (common with forms), or the distribution of white regions (column spacing etc.)
Identify regions where there is a possibility of diagonal text (diagonal, fat lines after blurring and thresholding is one method)
Rotate the partition and perform OCR
Merge results for different partitions

You can also try a brute force approach like rotating the image by a range of angles and performing OCR on all of them. The results will have to be merged.

"rotating the image by a range of angles and performing OCR on all of them," that could work but the header diagonal text is spaced closely - hence the reason for diagonal - and so the text in the preceding column and succeeding column is captured by the rectangular 'recognition box,' resulting in each field containing stray text. In the example form http://stackoverflow.com/questions/12455018/driver-logs-image-recognition-in-c-net I gave in the question, as an example, "Dallas, TX" and "Corsicana, TX" overlap. The OCR field will capture "Dallas, TX Lunch Corsica Lo" — forest.peterson, Dec 16 '13 at 19:54

OCR diagonally written text

1 Answers1