42

I have an image of which this is a small cut-out:

Image with a lot of white and black pixels

As you can see it are white pixels on a black background. We can draw imaginary lines between these pixels (or better, points). With these lines we can enclose areas.

How can I find the largest convex black area in this image that doesn't contain a white pixel in it?

Here is a small hand-drawn example of what I mean by the largest convex black area:

Small example

P.S.: The image is not noise, it represents the primes below 10000000 ordered horizontally.

Jean-François Corbett
  • 37,420
  • 30
  • 139
  • 188
orlp
  • 112,504
  • 36
  • 218
  • 315
  • 5
    if this are primes, why are they quite uniformly distributed? also, can you show us a small example (10x10 image) with the largest area marked? – Karoly Horvath Sep 07 '11 at 10:26
  • 2
    just to check - what's "best" mean? Fastest? Smallest memory? Smallest code (number of lines)? – Stuart Sep 07 '11 at 10:26
  • 1
    @yi_H: I don't know, but if you don't believe me you can check the first 40 pixels manually and write a little program to check the rest. – orlp Sep 07 '11 at 10:28
  • @Stuart: Well, in my case I think I'll go for Fastest, but because of the sheer size I think memory mustn't be neglected. But be creative, a golf solution would be nice too :) – orlp Sep 07 '11 at 10:30
  • 3
    Are you sure this gives a meaningful result? Integers have just one dimension. You only get an area because you break your numbers at a specific point and start a new row. You'll most probably get different largest black areas depending on where you break them. – Andreas Sep 07 '11 at 10:33
  • @yi_H: I added an example image. – orlp Sep 07 '11 at 10:39
  • 4
    No matter if these are really primes - the question is certainly interesting on its own. – Ingo Sep 07 '11 at 10:39
  • 1
    @DonAndre: It's more of a fun thing, I just had this thought of generating this image like this. I might've taken a contrast extraction of google maps or anything else. – orlp Sep 07 '11 at 10:41
  • Sounds like you're needing two algorithms, one to define the areas and then another to find the largest. In your example images the boundaries are not on specific pixels, which is a problem. We are still having to guess at your meaning. – Gerry Sep 07 '11 at 10:42
  • 1
    Looking at the example picture two things come to mind: 1) this is probably `O( a lot )` and 2) the largest area will be really big if you can ever calculate it. – JJJ Sep 07 '11 at 10:46
  • @Juhana: The area can not exceed 10000000, since that is the size of the complete image. The circumference however might not be that small :) – orlp Sep 07 '11 at 10:48
  • Your boundaries appear like constellations, very much up to the imagination. – Gerry Sep 07 '11 at 10:48
  • @nightcracker - I did have a "golf" go... but then had to delete it as I'm not sure I understand your definitions/diagrams - will have a play again later! – Stuart Sep 07 '11 at 10:50
  • This is a great question. I have no idea how to solve it. My first thought is to turn it into a list of triangles and then taking each triangle as a starting point try to merge in adjacent triangles that follow the rules until no more can be merged.... then start again with the next triangle as a starting point and see if you do better. That won't work, but I intend to think about this some more – jcoder Sep 07 '11 at 10:51
  • 4
    you're problem isn't well defined at all. because if the area can be concave you can draw a poligon that includes all the points and the area would be exactly the entire black area. one other thing would be if you requested the biggest convex area. but in your example that area is concave – Simone Sep 07 '11 at 10:52
  • @simone he said that the connecting lines need to be more than 1 pixel apart which should limit that. – jcoder Sep 07 '11 at 10:53
  • I wonder if math.stackexchange.com might be able to help on this one? – jcoder Sep 07 '11 at 10:53
  • 2
    @John it still seems to me not well defined..a nice problem would be finding the largest CONVEX black area and i will work on this and post the answer about this – Simone Sep 07 '11 at 10:59
  • 1
    Alright, I changed the requirements to simply convex. You were right, this weren't an interesting problem because my requirements were flawed. – orlp Sep 07 '11 at 13:37
  • 1
    @nightcracker: well the example doesn't show the largest available area.. there is room in the top left corner and also on the top and bottom. BTW as I said these cannot be prime numbers, as N increases there is less and less primes around N. – Karoly Horvath Sep 07 '11 at 14:38
  • @yi_H: Both wrong. Those ARE the prime numbers (they are arranged horizontally, look at the bottom rows and you'll notice they're less dense) and there may be no angles larger than 180 degrees in the shape (one part of the definition of convex). – orlp Sep 07 '11 at 14:44
  • @nightcracker: BOTH wrong. if you extend the top left corner it's still going to be convex. and they are definitely not prime numbers, they should be a *lot* less dense there. – Karoly Horvath Sep 07 '11 at 14:52
  • @yi_H: Unless I'm missing something that would create an angle greater than 180 degrees, would you please create an image of it? And let's see, the mostleft (0,0) pixel represents the number 0, so the pixel at (x, y) represents (10000*y+x). Let's check the white pixel at (3827,975). This is the number 9753827 - prime. Go check another pixel yourself. – orlp Sep 07 '11 at 15:03
  • http://stackoverflow.com/q/7161332/615464 has a similar question, but this deals with rectangles. There's also a Wikipedia page on *that* problem: http://en.wikipedia.org/wiki/Largest_empty_rectangle – Sjoerd C. de Vries Sep 08 '11 at 07:06
  • 1
    There is also the question which pixels make up the polygon's area. Are these the pixels not touched at all by the polygon's edges (as shown in red in the example) or the pixels whose midpoint isn't touched? And how is the edge actually defined? Is it the mathematically thin line drawn between two pixel midpoints or the pixellated line connecting the two pixels à la Bresenham (http://en.wikipedia.org/wiki/Bresenham's_line_algorithm)? The above mentioned axis-aligned rectangle problem is less ambiguous in that sense. – Sjoerd C. de Vries Sep 08 '11 at 08:54
  • The mathematical thin line, to keep it simple. – orlp Sep 08 '11 at 15:17

5 Answers5

12

Trying to find maximum convex area is a difficult task to do. Wouldn't you just be fine with finding rectangles with maximum area? This problem is much easier and can be solved in O(n) - linear time in number of pixels. The algorithm follows.

Say you want to find largest rectangle of free (white) pixels (Sorry, I have images with different colors - white is equivalent to your black, grey is equivalent to your white).

enter image description here

You can do this very efficiently by two pass linear O(n) time algorithm (n being number of pixels):

1) in a first pass, go by columns, from bottom to top, and for each pixel, denote the number of consecutive pixels available up to this one:

enter image description here

repeat, until:

enter image description here

2) in a second pass, go by rows, read current_number. For each number k keep track of the sums of consecutive numbers that were >= k (i.e. potential rectangles of height k). Close the sums (potential rectangles) for k > current_number and look if the sum (~ rectangle area) is greater than the current maximum - if yes, update the maximum. At the end of each line, close all opened potential rectangles (for all k).

This way you will obtain all maximum rectangles. It is not the same as maximum convex area of course, but probably would give you some hints (some heuristics) on where to look for maximum convex areas.

Tomas
  • 57,621
  • 49
  • 238
  • 373
  • As I wrote in the question comments: http://stackoverflow.com/q/7161332/615464 has a similar question, but this deals with rectangles. *I don't think I saw this solution there* There's also a Wikipedia page on that problem: http://en.wikipedia.org/wiki/Largest_empty_rectangle – Sjoerd C. de Vries Sep 21 '11 at 15:30
  • @TMS Finding the largest rectangular area would merely be a duplicate of this question: http://stackoverflow.com/questions/2478447 – tommy.carstensen May 24 '15 at 00:50
11

I'll sketch a correct, poly-time algorithm. Undoubtedly there are data-structural improvements to be made, but I believe that a better understanding of this problem in particular will be required to search very large datasets (or, perhaps, an ad-hoc upper bound on the dimensions of the box containing the polygon).

The main loop consists of guessing the lowest point p in the largest convex polygon (breaking ties in favor of the leftmost point) and then computing the largest convex polygon that can be with p and points q such that (q.y > p.y) || (q.y == p.y && q.x > p.x).

The dynamic program relies on the same geometric facts as Graham's scan. Assume without loss of generality that p = (0, 0) and sort the points q in order of the counterclockwise angle they make with the x-axis (compare two points by considering the sign of their dot product). Let the points in sorted order be q1, …, qn. Let q0 = p. For each 0 ≤ i < j ≤ n, we're going to compute the largest convex polygon on points q0, a subset of q1, …, qi - 1, qi, and qj.

The base cases where i = 0 are easy, since the only “polygon” is the zero-area segment q0qj. Inductively, to compute the (i, j) entry, we're going to try, for all 0 ≤ k ≤ i, extending the (k, i) polygon with (i, j). When can we do this? In the first place, the triangle q0qiqj must not contain other points. The other condition is that the angle qkqiqj had better not be a right turn (once again, check the sign of the appropriate dot product).

At the end, return the largest polygon found. Why does this work? It's not hard to prove that convex polygons have the optimal substructure required by the dynamic program and that the program considers exactly those polygons satisfying Graham's characterization of convexity.

  • 1
    As it turns out, there's an O(n^3) algorithm due to Edelsbrunner & Guibas 1989 and Dobkin, Edelsbrunner & Overmars 1990. I think mine is O(n^4). – the guy formerly known as d Sep 18 '11 at 23:55
  • 4
    The latter paper can be found at http://www.cs.duke.edu/~edels/Papers/1990-J-05-EmptyConvexPolygons.pdf, the former at http://www.cs.duke.edu/~edels/Papers/1989-J-01-TopologicalSweep.pdf – Sjoerd C. de Vries Sep 19 '11 at 20:13
5

You could try treating the pixels as vertices and performing Delaunay triangulation of the pointset. Then you would need to find the largest set of connected triangles that does not create a concave shape and does not have any internal vertices.

tkerwin
  • 9,559
  • 1
  • 31
  • 47
  • Good point. Some flood-filling algorithm can do the trick, given that it is easy to check if adding a triangle leaves the current configuration convex. – Alexandre C. Sep 07 '11 at 20:26
  • I doubt a greedy algorithm would work for finding the largest shape with those criteria. A dynamic programming solution might be needed. – tkerwin Sep 07 '11 at 20:31
  • Each triangle is contained in exactly one maximal convex connected component. Since such components are expected to be small, a greedy algorithm could do. – Alexandre C. Sep 07 '11 at 21:14
  • 3
    Your second sentence sounds to me like wiping the problem under the carpet. – Sjoerd C. de Vries Sep 08 '11 at 09:00
  • @Sjoerd It doesn't matter because the largest convex polygon might not respect the Delaunay triangulation anyway. Take a unit hexagon with two sides parallel to the y-axis, and then put two more points at (±1e6, 0). The triangulation blocks the six-sided polygon that excludes the top and bottom of the hexagon. – the guy formerly known as d Sep 19 '11 at 13:53
  • @theguyformerlyknownasd: So are you saying this answer is incorrect? – orlp Sep 21 '11 at 10:24
  • @nightcracker It might be a useful approximation, but there are some inputs for which any algorithm in this framework is not going to return the correct answer. – the guy formerly known as d Sep 21 '11 at 11:57
2

If I understand your problem correctly, it's an instance of Connected Component Labeling. You can start for example at: http://en.wikipedia.org/wiki/Connected-component_labeling

Jean-Denis Muys
  • 6,772
  • 7
  • 45
  • 71
  • It's not directly this problem, since all black pixels are connected. You might call it "convex component labeling", and even if they may be related, they are different problems. – static_rtti Sep 07 '11 at 14:19
  • @static_rtti: This is because the convex requirement wasn't in place first, because I didn't formulate the question good enough here, and for myself. A comment on the question enlightened me. Perhaps the asker of this question can retreat his answer? – orlp Sep 07 '11 at 14:29
  • Yeah I had missed the "convex" in your question. Sorry about that. – Jean-Denis Muys Sep 07 '11 at 14:31
  • Jean-Denis-Muys: No worries, I edited it in an hour ago because my original problem wasn't interesting at all (you can connect all but the outermost points). – orlp Sep 07 '11 at 14:53
1

I thought of an approach to solve this problem:

Out of the set of all points generate all possible 3-point-subsets. This is a set of all the triangles in your space. From this set remove all triangles that contain another point and you obtain the set of all empty triangles.

For each of the empty triangles you would then grow it to its maximum size. That is, for every point outside the rectangle you would insert it between the two closest points of the polygon and check if there are points within this new triangle. If not, you will remember that point and the area it adds. For every new point you want to add that one that maximizes the added area. When no more point can be added the maximum convex polygon has been constructed. Record the area for each polygon and remember the one with the largest area.

Crucial to the performance of this algorithm is your ability to determine a) whether a point lies within a triangle and b) whether the polygon remains convex after adding a certain point.

I think you can reduce b) to be a problem of a) and then you only need to find the most efficient method to determine whether a point is within a triangle. The reduction of the search space can be achieved as follows: Take a triangle and increase all edges to infinite length in both directions. This separates the area outside the triangle into 6 subregions. Good for us is that only 3 of those subregions can contain points that would adhere to the convexity constraint. Thus for each point that you test you need to determine if its in a convex-expanding subregion, which again is the question of whether it's in a certain triangle.

The whole polygon as it evolves and approaches the shape of a circle will have smaller and smaller regions that still allow convex expansion. A point once in a concave region will not become part of the convex-expanding region again so you can quickly reduce the number of points you'll have to consider for expansion. Additionally while testing points for expansion you can further cut down the list of possible points. If a point is tested false, then it is in the concave subregion of another point and thus all other points in the concave subregion of the tested points need not be considered as they're also in the concave subregion of the inner point. You should be able to cut down to a list of possible points very quickly.

Still you need to do this for every empty triangle of course.

Unfortunately I can't guarantee that by adding always the maximum new region your polygon becomes the maximum polygon possible.

Andreas
  • 6,447
  • 2
  • 34
  • 46