How do I make my flood fill algorithm more efficient?

Question

I have a flood fill function in my code that just returns a number on the cells filled. But it is excruciatingly slow, my A* pathfinding algorithm is exponentially faster than it. Here is the snippet:

   bool withinBoundaries(_2D::Point p) {
//cerr << "Checking if x: " << p.x << " y: " << p.y << " is within boundaries" << endl;
if (p.x <= 29 && p.y <= 19 && p.x >= 0 && p.y >= 0) {
    return true;
}
return false;
}


bool canMove(_2D::Point point, _2D::Point offset, map<pair<int, int>, bool> gameGrid) {
if (!gameGrid[(point + offset).toPair()] && withinBoundaries(point + offset)) return true;
else return false;
}

int floodFillGetArea(_2D::Point point, map<pair<int, int>, bool> gameGrid) {
    map<pair<int, int>, bool> checked;
    map<pair<int, int>, bool> gameGridCopy = gameGrid;
    deque<_2D::Point> openPoints;
    openPoints.push_back(point);
    int count = 0;

    while (!openPoints.empty()) {
    _2D::Point curPoint = openPoints.back();
    openPoints.pop_back();
    if(checked[curPoint.toPair()]) break;
    gameGridCopy[curPoint.toPair()] = true;
    count++;
    if (canMove(curPoint, _2D::Point::UP(), gameGridCopy)) {

        openPoints.push_back(curPoint + _2D::Point::UP());
    }
    if (canMove(curPoint, _2D::Point::RIGHT(), gameGridCopy)) {
        openPoints.push_back(curPoint + _2D::Point::RIGHT());
    }
    if (canMove(curPoint, _2D::Point::DOWN(), gameGridCopy)) {
        openPoints.push_back(curPoint + _2D::Point::DOWN());
    }
    if (canMove(curPoint, _2D::Point::LEFT(), gameGridCopy)) {
        openPoints.push_back(curPoint + _2D::Point::LEFT());
    }
    checked[curPoint.toPair()] = true;
}
return count;
}

This checks a node every second, why is it so slow? And _2D::Point is just an object with int x and int y. Also it creates open nods when they are already labelled as being checked, why?

You pass `gameGrid` and `checked` by copy instead of by const references... — Jarod42, Sep 21 '16 at 11:46
I think you will have better answer on http://codereview.stackexchange.com/ — Garf365, Sep 21 '16 at 11:46
gameGrid is static it's just a test only the function manipulates it to check the area. Also checked is supposed to be local I don't know why I put that as a function param — rakagunarto, Sep 21 '16 at 11:48
Is this your real code? `if(checked[curPoint.toPair()]) break;` looks odd. — Henrik, Sep 21 '16 at 12:25
Yeah it is, I have a function called toPair for my coordinate structs because maps dno't work well with objects — rakagunarto, Sep 21 '16 at 23:41
may be a silly questions but 1. Are You filling 2D grid or some kind of graph (people here are forgetting to specify that especially for graphs without realizing that the difference is a big one)? 2. Are you also changing the cell values or just counting the area involved? — Spektre, Sep 22 '16 at 06:51
Sorry if it wasn't clear, it is a graph but the nodes are adjacent to each other like a grid. I'm just counting the cell area, I'm not changing the cell values except in the function copy to track which cells have been checked — rakagunarto, Sep 22 '16 at 11:27

score 3 · Answer 1 · answered Sep 21 '16 at 12:04

3

For starter, you copy a std::map object 6 times for nothing.

try passing the map object as const reference rather than pass it by value (=copy)

answered Sep 21 '16 at 12:04

David Haim

25,446
3
44
78

No observable difference, but technically yes it should be faster – rakagunarto Sep 21 '16 at 12:09
have you compiled it on release mode with optimizations on? – David Haim Sep 21 '16 at 12:11
I have compiled it with optimizations on, but without it on, it shouldn't be this slow. My unoptimised A* pathfinding does like 500 nodes/millisecond + visualisation – rakagunarto Sep 21 '16 at 12:21

score 1 · Answer 2 · edited May 03 '17 at 16:52

Can you try with a 2D array of booleans instead of a map (which takes O(log N) operations to look up) for "gameGrid" object? Maybe switching to a hashtable could give similar performance if an array is not an option?

If there are 1024 points in a map, this should be multiplying the benchmark speed by 10-11.

If this is not enough, you can have multiple threads and multiple stages (synchronizing) to compute 1, 4, 16, 20, 20, 20, 16, 4, 1, 1, 4, 16, 2, 4... tiles at a time and finish quicker.

Try something like this (scanline fill?) on the array:

   while(canMove(right)){enqueue} // Sequential memory access
   reset horizontal pos, enable reverse prefetching
   while(canMove(left)){enqueue} // Sequential memory access

....

   then repeat for verticals

Then repeat with alternating horizontal+vertical computations until done. This pattern should be giving x2 memory performance multiplier, and on top of that a 10-11x array access timing upgrade, because it will not access to structs (pair) from memory, just a single byte.

As another way, you could also start multiple fill points at random places and iterate until one of them touches original flood, add its points to original points area, move all its work the original floot thread, and in the end release all non-touching floods. This could use cache more frequently maybe? But it needs more cores. Also trying a gpgpu version wouldn't hurt.

Note: This would implicitly enforce another optimization that @David Haim suggested. With all this optimizations, you must be seeing the exponential speedup that you mentioned about your A* algorithm.

Also iterating through a map means accessing the RAM in random order, whereas with an array you are accessing the RAM sequentially. — SirGuy, Sep 21 '16 at 12:45

How do I make my flood fill algorithm more efficient?

2 Answers2