Avoiding stack overflow when implementing depth first search in c++

Question

connected components of triangle faces in a mesh

I have implemented depth first search using the above link and works fine for most of my data samples. When my data sample is very large, the code reaches a break point, likely to be stack overflow as the recursive function gets too deep. Is there any way to avoid this? or do I have to use some other way like breadth first search / union find algorithm to find connected components in a graph.

#include <bits/stdc++.h>
using namespace std;
 
class Graph {
 
    // A function used by DFS
    void DFSUtil(int v);
 
public:
    int count;
    map<int, bool> visited;
    map<int, list<int> > adj;
    // function to add an edge to graph
    void addEdge(int v, int w);
 
    // prints DFS traversal of the complete graph
    void DFS();
};
 
void Graph::addEdge(int v, int w)
{
    adj[v].push_back(w); // Add w to v’s list.
}
 
void Graph::DFSUtil(int v)
{
    // Mark the current node as visited and print it
    visited[v] = true;
    cout << v << " ";
 
    // Recur for all the vertices adjacent to this vertex
    list<int>::iterator i;
    for (i = adj[v].begin(); i != adj[v].end(); ++i)
        if (!visited[*i])
            DFSUtil(*i);
}
 
// The function to do DFS traversal. It uses recursive
// DFSUtil()
void Graph::DFS()
{   
    count = 0;
    // Call the recursive helper function to print DFS
    // traversal starting from all vertices one by one
    for (auto i : adj)
        if (visited[i.first] == false)
         {
            DFSUtil(i.first);
            count++;
         }
}
 
int main()
{
    // Create a graph given in the above diagram
    Graph g;
    for face in faces :
    {
       g.addEdge(face[0], face[1]);
       g.addEdge(face[0], face[2]);
       g.addEdge(face[1], face[0]);
       g.addEdge(face[1], face[2]);
       g.addEdge(face[2], face[0]);
       g.addEdge(face[2], face[1]);
    }
    
 
    cout << "Following is Depth First Traversal \n";
       
      // Function call
    g.DFS();
    cout << "number of connected components = " << g.count << "\n";
    return 0;
}

*Is there any way to avoid this?* -- Rewrite the function using iteration (loop), not recursion. — PaulMcKenzie, Sep 28 '22 at 14:58
_"the code reaches a break point"_ What does a "break point" look like? Can you describe what you are actually observing? — Drew Dormann, Sep 28 '22 at 15:03
If I'm reading the code correctly, the only things recursively placed on the stack are the argument `int v` and a `list::iterator`. If that's what is overflowing your stack, you could move them off of the regular stack and into a separate FILO data structure (such as a `std::stack`) that you manually push to (before making a recursive call) and pop from (before the recursive call returns). That would have the advantage of allowing the (pseudo-)stack's size to dynamically increase (via a heap reallocation) when necessary, rather than causing a stack overflow. — Jeremy Friesner, Sep 28 '22 at 15:04
@ravenspoint The one that fail has about 200k node. But they are largely connected so i guess it causes the program to fail during recursion. I have another sample with more than 250k node but succeed as they are less connected. How do you increase the stack size? — Dong, Sep 28 '22 at 15:04
@Dong do not increase the stack size. Such operation is a very red flag that you are doing something wrong. And indeed, the correct solution is to rewrite the code to use heap instead of stack. — freakish, Sep 28 '22 at 15:13
@Dong -- And when you get one with 500,000 nodes, or a million nodes, do you again have to rebuild your program using a larger stack size? Why not fix it once and for all, and that is to rewrite the code using iteration. Then the issue becomes a matter of how fast the iterative solution is, and no need to worry about the stack being exhausted. — PaulMcKenzie, Sep 28 '22 at 15:33
@PaulMcKenzie Yes, it is so very tedious to have to rebuild a program from time to time. — ravenspoint, Sep 28 '22 at 15:50
@ravenspoint you mean you're fine with having customers call you at 3 in the morning so you send them a new build with bigger stack? By the way, how do you track which version each customer has so you fetch the proper debugging symbols when you need to investigate a crash? — spectras, Sep 28 '22 at 15:55
My rule: *the maximum recursion level for a recursive function shall be **O(log n)***. Having a **O(n)** level is asking for trouble: it will quickly crash with (not so) big `n`. And this type of recursive function can be very easily converted to a loop, that is easier to understand, easier to debug, safer and faster. — prapin, Sep 28 '22 at 16:01
@spectras Obviously I have a quick test that the number of nodes is reasonable for the stack size I deliver. BTW answering the phone at 3am is called self defeating behaviour. — ravenspoint, Sep 28 '22 at 16:02
[Why should I not #include ?](https://stackoverflow.com/questions/31816095/why-should-i-not-include-bits-stdc-h) — Jesper Juhl, Sep 28 '22 at 16:45
[Why is "using namespace std;" considered bad practice?](https://stackoverflow.com/questions/1452721/why-is-using-namespace-std-considered-bad-practice) — Jesper Juhl, Sep 28 '22 at 16:46
@ravenspoint and obviously your test accounts for whatever use case the customers have and will have in the upcoming years. Well, glad to hear that. — spectras, Sep 28 '22 at 17:17

score 5 · Accepted Answer · answered Sep 28 '22 at 15:11

5

With a recursive algorithm, there is no way to avoid a stack overflow. But, you need to re-implement your algo with an iterative version using a stack.

Here is a rough implementation .

void Graph::DFSUtil(int v)
{
    stack<int> stack;
    stack.push(v);
    cout << v << " ";
    while (!stack.empty())
    {
        int v = stack.top();
        stack.pop();
        if (!visited[v])
        {
            cout << v << " ";
            visited[v] = true;
        }
        for (int i = adj[v].begin(); i != adj[v].end(); ++i)
            if (!visited[*i])
                stack.push(*i);
    }
}

answered Sep 28 '22 at 15:11

SysAdmin

5,455
8
33
34

@ravenspoint - Relying on the application stack has its limit. If the input size (graph in this case) is unknown, then there is always the risk of hitting this upper limit. We are not re-inventing the wheel here. We are making the program reliable by making use of the heap memory instead of the stack memory. – SysAdmin Sep 28 '22 at 16:25
Why would the input size be unknown? – ravenspoint Sep 28 '22 at 16:31
The stack overflow is easily and simply solved by increasing the stack size ( -Wl,--stack,33554432 ). Performance is the important issue. The version using std::stack runs much faster, much to my surprise. On a 400,000 node graph a complete DFS takse 5 seconds v the recursive version taking 30 secs. Details at https://github.com/JamesBremner/PathFinder – ravenspoint Sep 28 '22 at 17:26

Avoiding stack overflow when implementing depth first search in c++

1 Answers1