Algorithm and data structure for solving the game "Globs"/flood fill/"FloodIt"

Question

Suggest an algorithm and data structure for solving the game Globs (http://www.deadwhale.com/play.php?game=131). It's pretty fun in a geeky kind of way.

State the time-space complexity (big-O) of your approach in terms of N, the size of the grid (N>=14). Good-enough efficient algorithms with low complexity are preferred.

(MatrixFrog correctly points out this game is also known as FloodIt, and Smashery gave a solution 3 months ago in the link he cites below. All you dudes suggesting pruning/greedy with only 1 lookahead, that gives suboptimal solutions.)

The game generates a random square grid of nxn nodes, where each node is colored one of six colors (Grn=1, Ylw=2, Red=3, Blu=4, Pur=5, Orn=6). Level 1 has 9x9 grid, then n increases each level, up to 14. Each level you can take up to 25 turns or else you lose. On each turn you choose which color to change the top left node to e.g. Grn->Red, such that any connected adjacent (horiz/vert) nodes of the new color get assimilated into a shape, and 1 pt per node assimilated is ADDED to your score. The scoring objective is to complete each grid in as few turns as possible, e.g. if you do it in 16 turns, then your 9 unused moves => 2*9 MULTIPLIER times your total accumulated score.

Obviously there are a ton of ways to decompose this, and the default choice of recursive backtracking with a 14x14 grid is a viable contender; What other types of data structures does this lend itself to? A* ? Don't get hung up on optimality, I'm wondering if there is a "good-enough" algorithm.

(I thought it might be a fun project to code up a robot and get silly-high scores. Although I scored 3.5E+12 all by my fleshware self.)

Just solve it! Do we use data structures at all, or just keep a 14x14 grid? If we use recursion, each stack frame will contain its own 14x14 grid + current score. — smci, Dec 28 '09 at 17:24
Was going to answer, but played a couple more rounds and realised I was using a silly lookahead that I wouldn't want to program! :D This is a little smarter than I first reckoned! Good question. — silentcontributor, Dec 28 '09 at 18:00
One way among others of looking at it is trying to discover the minimum-length path traversing the graph where a graph node represents a connected shape, and a connection represents adjacency. But the act of assimilation dynamically changes the graph topology, which is what makes it interesting. I think it's a waste of time and memory to construct the entire shape-adjacency graph though, I suggest that at each turn we only store the grid + those shapes which neighbor the shape connected to the TL blob (good for A* formulation?). And there's also the evolutionary algorithm approach. — smci, Dec 28 '09 at 18:55
I experimented with a few algorithms by hand, and while adjacency works pretty consistently for the first level it doesn't seem to go very far for the next. When I looked two hops out the behavior was better, but I think you might actually need to at least analyze the entire field to consistently get the best result. Fun question, thanks. — Tim Crone, Dec 28 '09 at 21:09
I think you should always go the most possible to the farthest node in the graph. And distance should be shortest possible path to each node. Whenever I play I always try to get to the other corner in the minimum number of moves and staying on the diagonal. That way you expand evenly in all directions. — Juan Besa, Dec 28 '09 at 21:23
@Tim: I said "recursive backtracking", I never suggested you should prune the fringe according to one lookahead. Recursive backtracking does in fact analyze ALL candidate solutions, while pruning suboptimal paths - that's precisely why I suggested it. @Juan: well we didn't yet define/agree what graph is best to use, but assuming we use a graph where nodes represent shapes and edges represent adjacency, the BR spot will on average tend to be farther from the TL spot than almost any other. But this observation is just an approximation; I had figured it out from playing manually. — smci, Dec 29 '09 at 01:32
This game also goes by the name "Flood It" so if anyone wants to google for solutions, that might help. http://www.labpixies.com/gadget_page.php?id=10 — Tyler, Dec 29 '09 at 06:33
In fact, this appears to be a duplicate of this question: http://stackoverflow.com/questions/1430962 — Tyler, Dec 29 '09 at 06:35
We should make a contest out of it, post a number of puzzle inputs, let people submit code to be judged, and post the results, including a number of random puzzles as well so they can't just hardcode a good solution to the sample inputs. — Lasse V. Karlsen, Dec 31 '09 at 19:45
@MatrixFrog: I didn't ask for an optimal solution, unlike that question. I was thinking a more efficient 'good-enough' solution with far lower complexity would be interesting. — smci, May 01 '15 at 23:07

score 5 · Accepted Answer · answered Jan 12 '10 at 18:53

This game really grabbed my interest, so I spent a couple of days working on it.

The first thing I noticed, is that it is easy to show that after the first board (maybe 2 in some cases), the fastest way to raise the score is by using the multiplier. Because of this, I built a system with the goal of solving each board in the fewest number of steps. I started out wanting to use A* because it is generally built for just these types of search problems... however, this problem still turned out to be a doozie.

When talking about A*, the effectiveness of it really boils down your choice of heuristic estimation. The closer you get to guessing the actual distance, the fewer nodes that will have to be expanded in order to reach the goal. For this problem, I went through a number of ideas for estimation, but most of them broke the A* rule, which is that you can NOT over estimate the actual distance, or else you break the optimality of A*.

There are a few that work however. Others in this thread have posted about just taking the number of remaining colors as the estimation, which is admissible because it cannot over estimate (you have to change colors at least once for each remaining color not part of the main "flood" area. The problem with this heuristic is that it very poorly estimates the actual distance. Take for instance the first move, which generally has an estimation of the number of colors, 6. It often expands into 2 moves, each of which generally has an estimation of 7, and so on and so on. Take this 5 levels deep and for a board size of 10x10, most leafs have an estimation of 11. This heuristic is basically an implementation of a breadth first search until you reach within 4 or 5 moves from your goal. This is not very efficient and in my own tests, the exponents run a much around board size 9, which often requires about 14 moves in the solution. It should be noted my solution was very high level however and not much care was taken to speed things up.

The problem is that A* is really only good when each step makes a significant refinement to the actual distance of the overall solution. Looking at the problem directly, you probably wont find a good heuristic that can do much better than this without over estimating the cost. However, if you transform the problem into another problem, better heuristics jump out at you. The heuristic "number of colors remaining" is answering the question, what is the smallest number of possible moves remaining. To the answer that question, I asked myself "which spot on the board requires the maximum number of steps to get to"? I ended up settling on the answer to "how many steps is it to the bottom right corner" for my heuristic. This is fairly easy to implement by running another A* search that works more like finding map directions and then counting the number of steps in the solution. I realize this is an arbitrary point on the board to select, however it worked quite well in testing and running A* on every remaining point took a fair amount of time on my single processor test machine.

This heuristic alone had a tendency to collapse after the bottom right corner became part of the flooded area however, so the final result was MAX(bottom right corner min steps, number of colors remaining not part of main flood). This was finally able to achieve some very large board sizes in under a second with my high level implementation.

I'll leave the record setting to you.

@Nick: "The problem is that A* is really only good when each step makes a significant refinement to the actual distance of the overall solution" <=> needs a good heuristic. Why not use "how many steps is it to the most distant square (in the color-connectivity sense, not Euclidean sense"? — smci, Jan 12 '10 at 20:25
I'm thinking by color-connectivity sense that you mean the number of color changes you have to go through in order to get to the answer, which brings you right back to the original problem, having no heuristic better than the number of colors not connected to the main flood. With Euclidean distance, you have an easily admissible (and good) heuristic and then you count the number of steps in the solution to the sub problem. — Nick Larsen, Jan 12 '10 at 21:01
No, I had proposed "distance of the most distant point in terms of the the number of successive color changes needed for the glob to reach it". But using Manhattan (not Euclidean) distance to most-distant-node, similar to what you suggest, would be way faster and require less computation. Strictly all we would need to do is check the corners/outer perimeter of the glob, and compute Manhattan distance to the most distant non-globbed nodes (in BR, BL, TR, TL). That should evaluate very fast. — smci, Jan 15 '10 at 02:17
It turns out to be very fast to use Dijkstra's algorithm for shortest path to all nodes, find the highest number and take the greater of that or number of non connected colors remaining. Its also way less complicated than my nested A* original approach. — Nick Larsen, Jan 25 '10 at 20:15
If you use color connectivity as your distance than there is a clear number of minimum steps to each node and you could use that distance in your heuristic. I think there is two more important factors in each step. The first is you should also try to maximize the number of squares you fill. Whenever I play the game I try to keep myself close to the diagonal and get to the south-east corner as fast as possible. This tends maximize my later moves since I flood in all directions. The second is that if you can flood all remaining squares of a color this is a free move. — Juan Besa, Jun 22 '10 at 01:03
@Jaun Besa Point 2 is correct, once you can flood all remaining squares of a color, there is no point in waiting. Point 1 however is not necessarily correct when you incorporate the value of the round. Since score is increased by selecting the most of one color at the same time, it can often be a good idea to flood a smaller color now in hopes of building a larger value from a subsequent move. — Nick Larsen, Jun 22 '10 at 13:24

score 1 · Answer 2 · answered Dec 31 '09 at 19:18

Given the fixed starting state and limited number of moves I think you can fully explore a decision tree. For each round, there are only 5 possible moves and wasted moves (choosing a color that will not 'glob' any neighbors what-so-ever) can be eliminated as the tree is built. Once the decision tree is built I think you could explore the point value of each path but if you needed more optimization a A* would definitely get you close.

For each round, I would have the basic state as a matrix of bit arrays for the state of the unglobbed locations (since the color no longer matters in the globbed locations you could save memory on your state data structure by leaving off the color bits) and a point value for each decision possible. Then your A*, or breadth first algorithm can just maximize the path values as normal. Save the path, and once your analysis is complete, make all of the determined moves.

score 1 · Answer 3 · answered Sep 26 '11 at 20:04

Another approach is to use genetic algorithms. Since any (partial) solution consists of a list of colors, it translates very nicely to a gene. A fitness function could be something like 4 times the connected component minus the number of colors used total (length of gene).

I tried this on 10x10 boards in Mathematica, with a very non-optimized algorithm, and got a short solution rather quickly. I do not claim it is optimal, but given enough time, the randomness in the process of mutating the genes will ensure that you eventually ends up with an optimal solution.

score 0 · Answer 4 · answered Sep 16 '13 at 10:01

A good heuristic is to generate the color-connected distance map. E.g. the current flood is at distance zero. A group of colors connected to a square at distance 'i' is at distance 'i+1'.

Next, observe how many colors are at the maximum distance. We need maximum distance moves to eliminate one color at the maximum distance, and we need an additional move to eliminate each additional color at the maximum distance. If all colors are not at the maximum distance, consider the colors at the previous distance which have not yet been eliminated. We might eliminate one of these colors while making 'maximum distance' moves, but we will need a move to eliminate each additional color.

This provides a fairly good estimate. On the initial 14x14 board position, I tend to get estimates of 17 to 18 while needing 20 to 22 moves for an optimal solution. A 14x14 board can typically be solved, with this lower bound, while looking at about 10,000 board positions. (Using the optimization of making a move that eliminates a color if such a move is available.)

amir beygi · Answer 5 · 2010-01-04T18:58:20.070

0

if you can, eliminate a color.
chose the color that generate more new neighbors for you !
goto step 1.

edited Jan 04 '10 at 18:58

answered Dec 29 '09 at 05:55

amir beygi

1,234
8
13

Step 1 is almost correct, but does not necessarily generate the highest score in the case when you know you wont finish the alloted number of moves. Step 2 can be easily shown to be non optimal when you take into account the score for accepting smaller steps now in preparation for taking larger steps later. – Nick Larsen Jan 08 '10 at 14:17
Could scan with depth of 1 or 2 generates a better result? – amir beygi Jan 09 '10 at 05:12
@Amir Step 1 is fine, Step 2 is pruning too early and with a suboptimal heuristic. Step 1: let's not worry too much about insoluble grids, and certainly not sacrifice performance in the common-case where the grid is soluble. Eliminating a color is in general good because it reduces the branching factor for all future regression. Step 2: Depth-1 or -2 is pruning too early. Depth-26 is full recursion (on a 14x14 grid) A side-question would be to investigate the effect of the pruning horizon on optimality. (Maybe say depth-12 is "good-enough"). – smci Jan 12 '10 at 20:22

score 0 · Answer 6 · answered Dec 30 '09 at 14:13

0

A brute-force recursive search will find the maximum score. You have at most 5^25 ending states to consider. Many intermediate states will be equivalent; it may be faster to recognize these and prune the search space of duplicates. Keep track of the highest score found so far as you search, along with the path (sequence of moves) that it took to get there.

answered Dec 30 '09 at 14:13

William Graham

9
1

2

While it is true that a brute force search will find the answer, I dont know of any system which can evaluate 298 million billion positions in any reason amount of time. Most of the pruning for this specific scenario will be in the very first move (at most 2 possible next useful states) and last 5 moves, which will keep you from finding a good answer in any reasonable amount of time. That being said, it is technically a valid solution. – Nick Larsen Jan 08 '10 at 14:24

score 0 · Answer 7 · answered May 01 '15 at 09:25

Another optimization is that there are some color blobs that do not need to be taken right away. There can be leafs in the network distance graph where one color blob does not have a further neighbor. This color blob doesn't need to be taken until the furthest blob of the same color. Using this approach, we can adjust the distance map to get the minimum time at which a color blob must be taken.

For some board positions, we will be able to show that an eligible color does not need to be taken on the next turn. We can avoid that color and reduce the branching factor.

Algorithm and data structure for solving the game "Globs"/flood fill/"FloodIt"

7 Answers7

Linked