1

I get in trouble by finding an algorithm to remove the convexity of my photos. As you can see the photos are captured from book pages, and I wanna remove the convexity. My question is similar to this but what I have is just page boundaries as input and neither I have grid nor am able to find by processing algorithms. enter image description here

I wanna output as the right one in the below photo. enter image description here

Obviously, the perspective transformation is the first thing comes in mind. However, as you can see the result is not promising: enter image description here

hosh0425
  • 98
  • 1
  • 10
  • In below right image, did you detect that rectangle or just draw? – Yunus Temurlenk Feb 09 '20 at 12:14
  • No, just draw it to make attention. – hosh0425 Feb 09 '20 at 18:32
  • Have you tried unwarping the perspective? You need a minimum of 4 input points (the corners of your rectangle) and 4 output points in the unwarped space. The 4 output points (the final positions of your 4 original input points) can be fixed if you pre-set the final rectangle. Maybe if you, somehow, detect the 4 corners of the page... – stateMachine Feb 10 '20 at 00:13
  • Yes, the first thing I have tried was perspective transformation. The left image is a perspective transformed from the original image. – hosh0425 Feb 10 '20 at 04:24
  • Apply a [4 point perspective transform](https://www.pyimagesearch.com/2014/08/25/4-point-opencv-getperspective-transform-example/) – nathancy Feb 10 '20 at 22:05
  • the first thing I have tried was perspective transformation. I updated the question buy the perspective transformed image. – hosh0425 Feb 11 '20 at 00:03

1 Answers1

2

Here's a possible pipeline to solve your problem. The main idea is to identify the text, create a super blob of it with some morphology, locate the 4 corners of this super blob and feed the points to a perspective "unwarper" (or rectifier, or whatever you wish to call that perspective correction method).

Start by converting your image to grayscale and apply adaptive thresholding to it. Try the Gaussian or Mean methods with parameters that better fit your tests. This is the result I obtain after fiddling with the values for a bit:

enter image description here

Now, the idea is to isolate just the text. The solution I applied is: obtain the biggest blobs and subtract them from the original image. You're going to need a method to calculate the area of each binary blob. Check this previous post for suggestions on how to implement one.

These are the biggest blobs from the image:

enter image description here

Subtract the largest blobs from the original image. This is the result:

enter image description here

As you can see, the text is almost isolated. Let me clean up the little bits of pixels by applying, again, an area filter. This time to eliminate the small blobs. This is the result:

enter image description here

Very good, some characters are lost during the operation, but that’s ok. We need a nice continuous block of text, because we are gonna dilate the hell of it. I tried applying a rectangular structuring element of size 5 and 5 Op iterations. Erode the output with 5 more iterations afterward, so you end up with this nice - isolated - super blob were the text used to be:

enter image description here

Check it out. The 3 markers you see are the centroids of the biggest blobs that I detected on the image. We need to find the 4 corners of the super blob. The biggest blob in the image is what we are after. I decided to re-use the area filter and look for the blob with the biggest area. This is the isolated super blob:

enter image description here

From here, the operations are pretty straightforward. Again, the goal is to get the four corners of this blob. You can fit a rectangle or apply an edge detector followed by Hough transform, to get the straight lines that follow the edges of the super blob.

I decided to apply a Canny Edge detector followed by Hough transform. Of course, I tuned the transform to filter only the possible lines I’m interested in – straight lines above a certain length. This is the result of the line detection:

enter image description here

There's some extra info plotted on the image. The markers you see (red and yellow) are the start/endpoints of the lines. My idea here was to find a bunch of these lines and compute the mean of these points. The idea is that we have a cluster of points that are separated in "quadrants". If we compute the mean of the start and endpoints of each line per quadrant, we will end up with 4 means – and these are the approximate values of the super blob’s corners!

I applied K-means to the start and endpoints of the lines, but you very well prefer other methods of processing. That's ok. My approximate corners are identified by the big red O markers in the above image.

As I suggested, try giving a fixed output position for these corners. I defined the red rectangle for the corners to be mapped on. For this test, I pretty much adjusted the rectangle manually. The perspective correction yields this result:

enter image description here

Some suggestions:

  1. Depending on the resolution of the input image, you could downsize it for a faster and better result, as your input seems big enough for that.

  2. Tune Hough Line Detection to yield larger lines. My current configuration detects some smaller lines and that can hinder the corner approximation.

  3. I choose a somewhat robust method for calculating the 4 corners of the super blob that I’ve personally used before (Edge detection + Hough Line Transform + K-means) but whatever processing chain you chose to obtain the data is entirely up to you!

stateMachine
  • 5,227
  • 4
  • 13
  • 29
  • Thank you for your complete explanations. I somehow tried it, it works well if I detect the box correctly. But I think if we let the boundaries are chosen by the user come to the algorithms you said we will get better results. Do you have any idea about how boundaries can help us? – hosh0425 Feb 13 '20 at 00:17