Wikipedia A* pathfinding algorithm takes a lot of time

Question

I've successfully implemented A* pathfinding in C# but it is very slow, and I don't understand why. I even tried not sorting the openNodes list but it's still the same.

The map is 80x80, and there are 10-11 nodes.

I took the pseudocode from here Wikipedia

And this is my implementation:

 public static List<PGNode> Pathfind(PGMap mMap, PGNode mStart, PGNode mEnd)
    {
        mMap.ClearNodes();

        mMap.GetTile(mStart.X, mStart.Y).Value = 0;
        mMap.GetTile(mEnd.X, mEnd.Y).Value = 0;

        List<PGNode> openNodes = new List<PGNode>();
        List<PGNode> closedNodes = new List<PGNode>();
        List<PGNode> solutionNodes = new List<PGNode>();

        mStart.G = 0;
        mStart.H = GetManhattanHeuristic(mStart, mEnd);

        solutionNodes.Add(mStart);
        solutionNodes.Add(mEnd);

        openNodes.Add(mStart); // 1) Add the starting square (or node) to the open list.

        while (openNodes.Count > 0) // 2) Repeat the following:
        {
            openNodes.Sort((p1, p2) => p1.F.CompareTo(p2.F));

            PGNode current = openNodes[0]; // a) We refer to this as the current square.)

            if (current == mEnd)
            {
                while (current != null)
                {
                    solutionNodes.Add(current);
                    current = current.Parent;
                }

                return solutionNodes;
            }

            openNodes.Remove(current);
            closedNodes.Add(current); // b) Switch it to the closed list.

            List<PGNode> neighborNodes = current.GetNeighborNodes();
            double cost = 0;
            bool isCostBetter = false;

            for (int i = 0; i < neighborNodes.Count; i++)
            {
                PGNode neighbor = neighborNodes[i];
                cost = current.G + 10;
                isCostBetter = false;

                if (neighbor.Passable == false || closedNodes.Contains(neighbor))
                    continue; // If it is not walkable or if it is on the closed list, ignore it.

                if (openNodes.Contains(neighbor) == false)
                {
                    openNodes.Add(neighbor); // If it isn’t on the open list, add it to the open list.
                    isCostBetter = true;
                }
                else if (cost < neighbor.G)
                {
                    isCostBetter = true;
                }

                if (isCostBetter)
                {
                    neighbor.Parent = current; //  Make the current square the parent of this square. 
                    neighbor.G = cost;
                    neighbor.H = GetManhattanHeuristic(current, neighbor);
                }
            }
        }

        return null;
    }

Here's the heuristic I'm using:

    private static double GetManhattanHeuristic(PGNode mStart, PGNode mEnd)
    {
        return Math.Abs(mStart.X - mEnd.X) + Math.Abs(mStart.Y - mEnd.Y);
    }

What am I doing wrong? It's an entire day I keep looking at the same code.

What is the size of your map? The number of expansions depends on the distance to the target node, and that depends on the map size. — Please treat your mods well., Feb 01 '11 at 16:11
How many nodes do you have in the list? What exactly is "slow" - how long it takes to do something? — Suma, Feb 01 '11 at 16:12
There are 11 rooms and 10 total pathfinding-connections. The Pathfind method takes about 8-10 seconds. — Vittorio Romeo, Feb 01 '11 at 16:19
10 seconds for 11 nodes? This is definitely too much even if your heuristics are bad. You should exaust all nodes well with a few miliseconds at worst. In this case I really suggest using poor man's profiler described in http://stackoverflow.com/questions/266373/one-could-use-a-profiler-but-why-not-just-halt-the-program — Suma, Feb 01 '11 at 16:27
Please; someone correct me if I'm wrong, but you have if (current == mEnd) { while (current != null) { solutionNodes.Add(current); current = current.Parent; } return solutionNodes; } Aren't mEnd and mStart already in the solutionNodes list? And won't your logic add them again? — Dave, Feb 01 '11 at 16:33

Eric Lippert · Accepted Answer · 2011-02-01T18:04:28.277

First off, use a profiler. Use tools to tell you what is slow; it is often surprising what is slow. And use a debugger. Make a map with five nodes in it and step through every line of the code as it tries to find the path. Did anything unexpected happen? If so, figure out what is going on.

Second, leaving aside your performance problem, I think you also have a correctness problem. Can you explain to us why you believe the Manhattan distance is a reasonable heuristic?

(For those reading this who are unfamiliar with the metric, the "Manhattan distance" or "taxi distance" is the distance between two points you'd have to travel if you lived in a city laid out on a grid. That is, to go 14 miles due northeast you'd have to travel 10 miles north and then 10 miles east for a total of 20 miles.)

The A* algorithm requires for its correctness that the heuristic always underestimates the actual distance required to travel between two points. If there are any "diagonal shortcut" streets in the graph then the Manhattan distance overestimates the distance on those paths and therefore the algorithm will not necessarily find the shortest path.

Why aren't you using the Euclidean distance as your heuristic?

Have you tried your algorithm using "always zero" as the heuristic? That is guaranteed to be an underestimate. (Doing so gives you an implementation of Dijkstra's Algorithm.)

Third, you seem to be doing an awful lot of sorting in this implementation. Surely you could be using a priority queue; that's a lot cheaper than sorting.

Fourth, I have an implementation of A* in C# 3 on my blog that you are welcome to use; no warranty expressed or implied, use at your own risk.

http://blogs.msdn.com/b/ericlippert/archive/tags/astar/

My code is very straightforward; the algorithm in my implementation looks like this:

var closed = new HashSet<Node>();
var queue = new PriorityQueue<double, Path<Node>>();
queue.Enqueue(0, new Path<Node>(start));
while (!queue.IsEmpty)
{
    var path = queue.Dequeue();
    if (closed.Contains(path.LastStep)) continue;
    if (path.LastStep.Equals(destination)) return path;
    closed.Add(path.LastStep);
    foreach(Node n in path.LastStep.Neighbours)
    {
        double d = distance(path.LastStep, n);
        var newPath = path.AddStep(n, d);
        queue.Enqueue(newPath.TotalCost + estimate(n), newPath);
    }
}

The idea is that we maintain a priority queue of paths; that is, a queue of paths that is always able to tell you the path so far with the least distance. Then we check to see if we've made it to our destination; if so, we're done. If not, then we enqueue a bunch of new paths based on their (under)estimated distance to the goal.

Fifth, that pseudocode in Wikipedia could be improved. The fact that my actual code is in many ways easier to follow than the pseudocode indicates that maybe there is too much detail in the pseudocode.

"Can you explain to us why you believe the Manhattan distance is a reasonable heuristic?" As he is not moving diagonal (expanding four neighbours, see comment to http://stackoverflow.com/questions/4864945/wikipedia-a-pathfinding-algorithm-takes-a-lot-of-time/4865257#4865257), Manhattan is correct. — Suma, Feb 01 '11 at 17:42
What is distance using and what is estimate using? Are they both using euclidean? — Vittorio Romeo, Feb 01 '11 at 19:37
@Vee: "distance" is any function that takes two nodes and computes the *exact* distance between them; "estimate" is a function that takes one node and estimates the distance from that node to the goal node. How those methods do their work is irrelevant to the implementation of A*, so long as they do their work correctly and efficiently. They might be a Euclidean distance if that made sense. Remember, A* can be used to find paths in *arbitrary* graphs; it doesn't have to be a real-world map. The Euclidean metric only makes sense in a coordinate system. — Eric Lippert, Feb 01 '11 at 20:15
If diagonal motion is allowed with the same cost as grid motion, the Euclidean distance (2-norm) is not the correct metric. The infinity-norm (`max(abs(x1-x2),abs(y1-y2))`) is. When diagonal motion is not allowed, the Manhattan distance (1-norm, `abs(x1-x2)+abs(y1-y2)`) is correct. — Ben Voigt, Feb 02 '11 at 20:06

Ben Voigt · Answer 2 · 2011-02-01T16:28:09.897

A couple notes:

List<T> is not optimized for removing the first element. Better would be to sort in the opposite order and take the last element. Or use a Stack<T> or Queue<T>.

List.Remove(current) is extremely inefficient. You already know the index you want to remove, don't search the entire list for the element.

Keeping openNodes sorted by inserting new nodes in the correct location should be much faster than resorting the entire list continually. Skipping the list sort breaks the entire algorithm by removing important invariants. You need to make the sort faster, not skip it.

The primary operation you're doing on closedNodes is a presence test, closedNodes.Contains(). Use a data structure that is optimized for that, e.g. Set<T>. Or better yet, put a closed flag field in each node and clear them all at the beginning of each pass. This will be significantly faster than doing a linear search through closedNodes in each iteration.

You shouldn't put anything in solutionNodes initially, both mEnd and mStart will get added during the final loop that traverses the path.

neighborNodes could be an IEnumerable<T> instead of a List<T>, since you never need the entire list at once. Using foreach would also be slightly faster than enumerating the list by index.

score 1 · Answer 3 · answered Feb 01 '11 at 16:20

1

You can compare it with (or just make use of) the A* implementation in quickgraph library:

QuickGraph.Algorithms.ShortestPath.AStarShortestPathAlgorithm<TVertex,TEdge>

answered Feb 01 '11 at 16:20

digEmAll

56,430
9
115
140

I didn't find any documentation on how to use that. TVertex should be my PGNode, right? What about TEdge? – Vittorio Romeo Feb 01 '11 at 16:29
@Vee: You should transform your map in a graph (basically each tile (node) is connected to the accessible neighbors throug an edge, e.g. `new Edge(source,target)`) – digEmAll Feb 01 '11 at 16:59

score 0 · Answer 4 · edited Feb 09 '14 at 20:35

0

What is the memory consumption like? Download the Red Gate tools. Use Performance Profiler to see where the most time is being spent and Memory Profiler to determine if you have any issues with memory leaks or object not being disposed quickly enough.

As @Rodrigo pointed out, you could have a large map to deal with. Nested loops are never expected to be performant.

edited Feb 09 '14 at 20:35

David Atkinson

5,759
2
28
35

answered Feb 01 '11 at 16:15

Dustin Davis

14,482
13
63
119

score 0 · Answer 5 · answered Feb 01 '11 at 16:37

0

You compute traversal node cost like this:

cost = current.G + 10;

Yet for heuristics you have a simple distance. Why not using the same distance even here? Depending on how far your nodes currently are, your heuristics may be a lot too low.

Another "detail" which might be wrong: current.GetNeighborNodes. How is this implemented? Does it return the same node as it should for the same location, so that the same node on different paths is shared, or does it always allocated a new node using new?

answered Feb 01 '11 at 16:37

Suma

33,181
16
123
191

current.GetNeighborNodes returns the 4 adjacent already-existing nodes. A 80x80 array of nodes is created with the map. – Vittorio Romeo Feb 01 '11 at 16:54
OK. What about the heuristics? If the distance between nodes is 1 and cost is 10, heuristics should be distance * 10, not distance. With underestimating heuristics your search will be close to Djisktra. – Suma Feb 01 '11 at 17:08

score -1 · Answer 6 · answered Jun 25 '13 at 19:43

Are you using a grid for your terrain representation ? If so, then the best heuristic in that case is Octile:

Heuristic Cost= (min(Differrence in x, Differrence in y) * square root of 2 + max(Differrence in x, Differrence in y) - min(Differrence in x, Differrence in y))

For grids, this will always be optimal. Unfortunate this heuristic isn't so well known.

Another useful tip is to choose a data structure for your open list to suit the size of your map. If your map is relatively small (100 x 100), then an unsorted vector will be the fastest way to go. To remove elements, simply do an iterator swap of the last element and the one you want to remove and call pop_back. If you have a bigger map, use a heap. You only care about the cheapest node, so sorting everything else will have no benefit. A heap inserts and sorts with complexity log n, perfect for medium and large data sets but slow for small ones.

Lastly, if speed is so great an issue, implemente Jump Point Search. It is, on average, 20 to 30 times faster than pathfinding A*, with no memory overhead (or so the research paper claims, haven't found any proof about that one). It will basically substitute the "find neighbors" step of A* with "find successors", so incorporating it into your code should be relatively simple.

Hope that helps.

Wikipedia A* pathfinding algorithm takes a lot of time

6 Answers6