I have an grayscale image of a comic strip page that features several dialogue bubbles (=speech baloons, etc), that are enclosed areas with white background and solid black borders that contain text inside, i.e. something like that:
I want to detect these regions and create a mask (binary is ok) that will cover all the inside regions of dialogue bubbles, i.e. something like:
The same image, mask overlaid, to be totally clear:
So, my basic idea of the algorithm was something like:
- Detect where the text is — plant at least one pixel in every bubble. Dilate these regions somewhat and apply threshold to get a better starting ground; I've done this part:
Use a flood fill or some sort of graph traversal, starting from every white pixel detected as a pixel-inside-bubble on step 1, but working on initial image, flooding white pixels (which are supposed to be inside the bubble) and stopping on dark pixels (which are supposed to be borders or text).
Use some sort of binary_closing operation to remove dark areas (i.e. regions that correspond to text) inside bubbles). This part works ok.
So far, steps 1 and 3 work, but I'm struggling with step 2. I'm currently working with scikit-image, and I don't see any ready-made algorithms like flood fill implemented there. Obviously, I can use something trivial like breadth-first traversal, basically as suggested here, but it's really slow when done in Python. I suspect that intricate morphology stuff like binary_erosion or generate_binary_structure in ndimage or scikit-image, but I struggle to understand all that morphology terminology and basically how do I implement such a custom flood fill with it (i.e. starting with step 1 image, working on original image and producing output to separate output image).
I'm open to any suggestions, including ones in OpenCV, etc.