1

I have a flood fill function in my code that just returns a number on the cells filled. But it is excruciatingly slow, my A* pathfinding algorithm is exponentially faster than it. Here is the snippet:

   bool withinBoundaries(_2D::Point p) {
//cerr << "Checking if x: " << p.x << " y: " << p.y << " is within boundaries" << endl;
if (p.x <= 29 && p.y <= 19 && p.x >= 0 && p.y >= 0) {
    return true;
}
return false;
}


bool canMove(_2D::Point point, _2D::Point offset, map<pair<int, int>, bool> gameGrid) {
if (!gameGrid[(point + offset).toPair()] && withinBoundaries(point + offset)) return true;
else return false;
}

int floodFillGetArea(_2D::Point point, map<pair<int, int>, bool> gameGrid) {
    map<pair<int, int>, bool> checked;
    map<pair<int, int>, bool> gameGridCopy = gameGrid;
    deque<_2D::Point> openPoints;
    openPoints.push_back(point);
    int count = 0;

    while (!openPoints.empty()) {
    _2D::Point curPoint = openPoints.back();
    openPoints.pop_back();
    if(checked[curPoint.toPair()]) break;
    gameGridCopy[curPoint.toPair()] = true;
    count++;
    if (canMove(curPoint, _2D::Point::UP(), gameGridCopy)) {

        openPoints.push_back(curPoint + _2D::Point::UP());
    }
    if (canMove(curPoint, _2D::Point::RIGHT(), gameGridCopy)) {
        openPoints.push_back(curPoint + _2D::Point::RIGHT());
    }
    if (canMove(curPoint, _2D::Point::DOWN(), gameGridCopy)) {
        openPoints.push_back(curPoint + _2D::Point::DOWN());
    }
    if (canMove(curPoint, _2D::Point::LEFT(), gameGridCopy)) {
        openPoints.push_back(curPoint + _2D::Point::LEFT());
    }
    checked[curPoint.toPair()] = true;
}
return count;
}

This checks a node every second, why is it so slow? And _2D::Point is just an object with int x and int y. Also it creates open nods when they are already labelled as being checked, why?

Spektre
  • 49,595
  • 11
  • 110
  • 380
rakagunarto
  • 325
  • 3
  • 23
  • 2
    You pass `gameGrid` and `checked` by copy instead of by const references... – Jarod42 Sep 21 '16 at 11:46
  • I think you will have better answer on http://codereview.stackexchange.com/ – Garf365 Sep 21 '16 at 11:46
  • gameGrid is static it's just a test only the function manipulates it to check the area. Also checked is supposed to be local I don't know why I put that as a function param – rakagunarto Sep 21 '16 at 11:48
  • 1
    Is this your real code? `if(checked[curPoint.toPair()]) break;` looks odd. – Henrik Sep 21 '16 at 12:25
  • Yeah it is, I have a function called toPair for my coordinate structs because maps dno't work well with objects – rakagunarto Sep 21 '16 at 23:41
  • may be a silly questions but 1. Are You filling 2D grid or some kind of graph (people here are forgetting to specify that especially for graphs without realizing that the difference is a big one)? 2. Are you also changing the cell values or just counting the area involved? – Spektre Sep 22 '16 at 06:51
  • Sorry if it wasn't clear, it is a graph but the nodes are adjacent to each other like a grid. I'm just counting the cell area, I'm not changing the cell values except in the function copy to track which cells have been checked – rakagunarto Sep 22 '16 at 11:27

2 Answers2

3

For starter, you copy a std::map object 6 times for nothing.

try passing the map object as const reference rather than pass it by value (=copy)

David Haim
  • 25,446
  • 3
  • 44
  • 78
1

Can you try with a 2D array of booleans instead of a map (which takes O(log N) operations to look up) for "gameGrid" object? Maybe switching to a hashtable could give similar performance if an array is not an option?

If there are 1024 points in a map, this should be multiplying the benchmark speed by 10-11.

If this is not enough, you can have multiple threads and multiple stages (synchronizing) to compute 1, 4, 16, 20, 20, 20, 16, 4, 1, 1, 4, 16, 2, 4... tiles at a time and finish quicker.

Try something like this (scanline fill?) on the array:

   while(canMove(right)){enqueue} // Sequential memory access
   reset horizontal pos, enable reverse prefetching
   while(canMove(left)){enqueue} // Sequential memory access

....

   then repeat for verticals

Then repeat with alternating horizontal+vertical computations until done. This pattern should be giving x2 memory performance multiplier, and on top of that a 10-11x array access timing upgrade, because it will not access to structs (pair) from memory, just a single byte.

As another way, you could also start multiple fill points at random places and iterate until one of them touches original flood, add its points to original points area, move all its work the original floot thread, and in the end release all non-touching floods. This could use cache more frequently maybe? But it needs more cores. Also trying a gpgpu version wouldn't hurt.

Note: This would implicitly enforce another optimization that @David Haim suggested. With all this optimizations, you must be seeing the exponential speedup that you mentioned about your A* algorithm.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
huseyin tugrul buyukisik
  • 11,469
  • 4
  • 45
  • 97