Algorithms to classify and extract crossword grids from an image

Question

I am looking for algorithms to, given an image containing a crossword

crop the image to just the crossword
distinguish between regular and barred crosswords
extract the grid size and the positions of the black squares/bars

The crossword itself can be assumed to be regular (i.e. I am interested in crosswords that have been generated by some program and published as an image, rather than scanned paper-based crosswords), and I would like the program to run without needing any inputs other than the image bitmap.

I can think of some brute-force multi-pass ways to do this (essentially using variants of imagemagick's hit-and-miss filter and then looping over the image looking for leftover dots) but I'm hoping for better ideas from people who actually know about image processing.

see the links for 'regular' and 'barred' – Martin DeMello Jan 31 '14 at 03:58 — Martin DeMello, Jan 31 '14 at 03:58

score 9 · Answer 1 · edited May 23 '17 at 12:31

This is a really broad question, but I wil try to give you some pointers. These are the steps you need to take:

Detect the position of the crossword.
Detect the grid of the crossword. For this, you will need some Computer Vision algorithm (for example the Hough lines detector).
For each cell, you need to find if it have a character or not. To do so, you just simply analize the "amount" of white color does the cell have
For the cells containing a character you need to recognize it. To do so, you need an OCR, and I recommend you Tesseract.
Create your own algorithm for solving the crosswords. You can use this.

And here (1,2,3) you have an example of a Sudoku Solver in Python. The first steps are common to your problem so you can use OpenCV to solve it like this:

import cv2
import numpy as np

#Load the Black and White image
img =  cv2.imread('sudoku.jpg')
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
gray = cv2.GaussianBlur(gray,(5,5),0)
thresh = cv2.adaptiveThreshold(gray,255,1,1,11,2)

#Detect the lines of the sudoku
contours, hierarchy = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)

#Detect the square of the Sudoku
biggest = None
max_area = 0
for i in contours:
        area = cv2.contourArea(i)
        if area > 100:
                peri = cv2.arcLength(i,True)
                approx = cv2.approxPolyDP(i,0.02*peri,True)
                if area > max_area and len(approx)==4:
                        biggest = approx
                        max_area = area

it's actually a far more defined problem than that; all i want is to extract the grid, ignoring any numbers or fill (i want to take grids embedded in pdfs and import them into a more interactive program). opencv looks great, though, thanks for the pointer. — Martin DeMello, Jan 30 '14 at 23:36

score 2 · Accepted Answer · answered Jan 30 '14 at 09:22

Using a screenshot of the linked crossword as example, I assume that:

the crossword grid is crisp, i.e. the horizontal and vertical grid lines are drawn at exact pixels with a constant dark colour and that there is no noise inside the grid cells,
the crossword is black or another relatively dark colour ("black") on white or light grey ("white"),
the clue numbers are written in the top left corner,
the crossword is rectangular and regular.

You can then scan the image from top to bottom to find horizontal black lines of sufficient length. A line starts with a black pixel and ends with a white pixel. Other pixels are indicators that it is not a line. (This is to weed out text and buttons.) Do the same for vertical lines.

Ideally, you now have the crossword lines. If your image is not cropped to the crossword, you might have false positives, such as the button borders. To find the crossword lines, sort them by length and look for the largest contiguous block of the same length. These should be your crossword lines unless you hae some degenerate cases

Now do a nested loop of horizontal and vertical lines, but skip the first line. Look two or three pixels to the northwest of the intersection of the lines. If the pixel is dark, that's a blank. If it is light, it's a cell. This heuristic seems to work well. I say dark and light here, bacause some crosswords use grey cells to save on ink when printing and some cell are highlighted in the screenshot.

If you end up with no blanks, you have a barred crossword. You can find the bars by checking whether one of the pixels to the left and right of a cell border is black.

Lastly, a tip: If you want to use your algorithm to find the cells in a crossword generated with the Crossword Compiler, look at the source. You will find a link to a Javascript file /puzzles/sample/cryptic_demo/cryptic_demo_xml.js, which contans the crossword as XML string, which also gives you the clues as a bonus.

Older versions of the Crossword Compiler, such as the one used for the Independent Cryptic hide their data in a file loaded from an applet. The format of that file is binary, but not too hard to read if you know the original data.

thanks, that seems like a nice, straightforward procedure. that's a useful tip about crossword compiler too; i was mostly thinking of crosswords in images from pdfs, but i'll add an option to read them from webpages somewhere down the line. — Martin DeMello, Jan 30 '14 at 23:38

score 1 · Answer 3 · answered Jan 30 '14 at 08:54

1

Try hough transform to find squares and when you get the squares check using histogram whether it is a dark or white square using threshold on its gray scale values

answered Jan 30 '14 at 08:54

Vikram Bhat

6,106
3
20
19

score 1 · Answer 4 · answered May 02 '14 at 20:34

Thinking of an alternative way to do this.

This is similar in many respects to object recongition, computer vision

One way would be to use a framework like openCV which, trained with some samples of what you want to detect, can detect any similar results

(a javascript library for object detection based on Viola-Jones algorithm, used also by openCV and of which am the author is HAAR.js)

Apart from this (or a similar alternative to this) there is a possibility of constructing a "visual" template of a crossword you want to detect (in a scale-invariant way)

and scan the images looking for correlations of parts of the image with the template (complexity O(N*M), N size of image, M size of template)

Since crossword grids have relatively constant shapes (especially fixed outputs of crossword compilers) it should be relative easy to create a prototype template and have success in matching (and aligning) the detected regions to extract the shape information

Algorithms to classify and extract crossword grids from an image

4 Answers4

Linked