I have a scanned word document containing a table. I need to extract the contents of every cell/rectangle in the scanned image. For example, take a look at this image:
Given that image, I need to retrieve an array of rectangles (Coordinates) in c# for each cell in the image. I'm using AForge but this is not a requirement.
What I've tried:
I've tried using blob processing. This works to some extend but not always. With some images it is able to retrieve 80-90% of the cells, while in some others it only retrieves 1 blob (The whole image).
I've tried applying the following filters: Grayscale -> Otsu Thresholding -> Canny edge detection and then processing the final image with hough line transform. I was hoping it would keep the straight lines as black and everything else as white which would make the task much easier using a custom algorithm. However, it either detects additional lines (Probably from the text) or skips some of the lines between cells.
I've tried applying different combinations of filters in both of my attempts but I was unsuccessful. How can I achieve something like this?