32

I want to create a depth first search which I have been somewhat successful in.

Here is my code so far (Except my constructor, note the Vertex and Edge classes only contain properties, nothing important to post here):

private Stack<Vertex> workerStack = new Stack<Vertex>();
private List<Vertex> vertices = new List<Vertex>();
private List<Edge> edges = new List<Edge>();

private int numberOfVertices;
private int numberOfClosedVertices;
private int visitNumber = 1;

private void StartSearch()
{
    // Make sure to visit all vertices
    while (numberOfClosedVertices < numberOfVertices && workerStack.Count > 0)
    {
        // Get top element in stack and mark it as visited
        Vertex workingVertex = workerStack.Pop();
        workingVertex.State = State.Visited;

        workingVertex.VisitNumber = visitNumber;
        visitNumber++;

        numberOfClosedVertices++;

        // Get all edges connected to the working vertex
        foreach (Vertex vertex in GetConnectedVertices(workingVertex))
        {
            vertex.Parent = workingVertex;
            workerStack.Push(vertex);
        }
    }
}

private List<Vertex> GetConnectedVertices(Vertex vertex)
{
    List<Vertex> vertices = new List<Vertex>();

    // Get all vertices connected to vertex and is unvisited, then add them to the vertices list
    edges.FindAll(edge => edge.VertexSource == vertex && edge.VertexTarget.State == State.Unvisited).ForEach(edge => vertices.Add(edge.VertexTarget));

    return vertices;
}

Its working in the way that all vertices get visited, but not in the right order.

Here is a comparison of how mine gets visited compared to wikipedia: Comparison

Its seems mine is turned around and is beginning from right to left.

Do you know what causes it? (Also any advice on my implementation would be greatly appreciated)

EDIT: I got my answer, but still wanted to show the end result for the GetConnectedVertices method:

private List<Vertex> GetConnectedVertices(Vertex vertex)
{
    List<Vertex> connectingVertices = new List<Vertex>();

    (from edge in edges
     where edge.VertexSource == vertex && edge.VertexTarget.State == State.Unvisited
     select edge).
     Reverse().
     ToList().
     ForEach(edge => connectingVertices.Add(edge.VertexTarget));

    return connectingVertices;
}
TylerH
  • 20,799
  • 66
  • 75
  • 101
Dumpen
  • 1,622
  • 6
  • 22
  • 36

6 Answers6

61

It seems mine is turned around and is beginning from right to left. Do you know what causes it?

As others have noted, you are pushing the nodes-to-visit-next on the stack in order from left to right. That means they get popped off right-to-left, since a stack reverses the order. Stacks are last-in-first-out.

You can fix the problem by making GetConnectedVertices build a stack, not a list. That way the connected vertices are reversed twice, once when they go on the returned stack and once when they go on the real stack.

Also any advice on my implementation would be greatly appreciated

The implementation works, I suppose, but it has a great many fundamental problems. If I were presented that code for review, here's what I'd say:

First off, suppose you wanted to do two depth-first searches of this data structure at the same time. Either because you were doing it on multiple threads, or because you have a nested loop in which the inner loop does a DFS for a different element than the outer loop. What happens? They interfere with each other because both try to mutate the "State" and "VisitNumber" fields. It is a really bad idea to have what should be a "clean" operation like searching actually make your data structure "dirty".

Doing so also makes it impossible for you to use persistent immutable data to represent redundant portions of your graph.

Also, I notice that you omit the code that cleans up. When is "State" ever set back to its original value? What if you did a second DFS? It would immediately fail since the root is already visited.

A better choice for all these reasons is to keep the "visited" state in its own object, not in each vertex.

Next, why are all the state objects private variables of a class? This is a simple algorithm; there's no need to build an entire class for it. A depth first search algorithm should take the graph to search as a formal parameter, not as object state, and it should maintain its own local state as necessary in local variables, not fields.

Next, the abstraction of the graph is... well, its not an abstraction. It's two lists, one of vertices and one of edges. How do we know that those two lists are even consistent? Suppose there are vertices that are not in the vertex list but are on the edge list. How do you prevent that? What you want is a graph abstraction. Let the graph abstraction implementation worry about how to represent edges and find neighbours.

Next, your use of ForEach is both legal and common, but it makes my head hurt. It is hard to read your code and reason about it with all the lambdas. We have a perfectly good "foreach" statement. Use it.

Next, you are mutating a "parent" property but it is not at all clear what this property is for or why it is being mutated during a traversal. Vertices in an arbitrary graph do not have "parents" unless the graph is a tree, and if the graph is a tree then there is no need to keep track of the "visited" state; there are no loops in a tree. What is going on here? This code is just bizarre, and it is not necessary to perform a DFS.

Next, your helper method named GetConnectedVertices is a lie. It does not get connected vertices, it gets connected not-already-visited vertices. Methods whose names lie are very confusing.

Finally, this claims to be a depth first search but it doesn't search for anything! Where is the thing being searched for? Where is the result returned? This isn't a search at all, it's a traversal.

Start over. What do you want? A depth-first traversal of a graph given a starting vertex. Then implement that. Start by defining what you are traversing. A graph. What service do you need from a graph? A way of getting the set of neighbouring vertices:

interface IGraph
{
    IEnumerable<Vertex> GetNeighbours(Vertex v);
}

What is your method returning? A sequence of Vertices in depth-first order. What does it take? A starting vertex. OK:

static class Extensions
{
    public static IEnumerable<Vertex> DepthFirstTraversal(
        this IGraph graph, 
        Vertex start) 
    { ... }
}

We now have a trivial implementation of depth first search; you can now use the Where clause:

IGraph myGraph = whatever;
Vertex start = whatever;
Vertex result = myGraph.DepthFirstTraversal(start)
                       .Where(v=>something)
                       .FirstOrDefault();

OK, so how are we going to implement that method so it does a traversal without wrecking the graph's state? Maintain your own external state:

public static IEnumerable<Vertex> DepthFirstTraversal(
    this IGraph graph, 
    Vertex start) 
{
    var visited = new HashSet<Vertex>();
    var stack = new Stack<Vertex>();

    stack.Push(start);

    while(stack.Count != 0)
    {
        var current = stack.Pop();

        if(!visited.Add(current))
            continue;

        yield return current;

        var neighbours = graph.GetNeighbours(current)
                              .Where(n=>!visited.Contains(n));

        // If you don't care about the left-to-right order, remove the Reverse
        foreach(var neighbour in neighbours.Reverse()) 
            stack.Push(neighbour);
    }
}

See how much cleaner and shorter that is? No mutation of state. No mucking around with edge lists. No badly-named helper functions. And the code actually does what it says it does: traverses a graph.

We also get the benefits of iterator blocks; namely, if someone is using this for a DF search, then the iteration is abandoned when the search criteria are met. We don't have to do a full traversal if we find the result early.

vulcan raven
  • 32,612
  • 11
  • 57
  • 93
Eric Lippert
  • 647,829
  • 179
  • 1,238
  • 2,067
  • Yes, this is allot better. Just one question though, do you recommend having my Vertex class contain which edges is connected for the vertex so the GetNeighbours method makes use of those edges to find its neighbours? By the way, I still marked Adrians answer as "Accepted answer" since it solved my original problem (I gave you a thumbs up though :-)) – Dumpen Apr 28 '11 at 13:26
  • 1
    @Dumpen: It depends on the nature of the graph. Suppose V and E are the number of vertices and edges. If the "branching factor" E/V -- that is, the average number of edges per vertex -- is low then your suggestion is efficient. If it is high -- because every vertex is connected to almost every vertext -- then it is inefficient to store all that information per vertex; instead make a 2-D "edge table" of Booleans and have the graph maintain that. – Eric Lippert Apr 28 '11 at 14:07
  • All right, thank you once again. I got your code working, but one last question (I hope): Why do you declare the hashset and stack with "var" instead of letting the compiler know what the data-type is? – Dumpen Apr 28 '11 at 14:47
  • 7
    @Dumpen: The compiler doesn't need my help; it knows what the data type is. The question of whether or not to use "var" in this case is whether the *humans* need help figuring it out. You're a human; do you really need to see "Stack stack = new Stack()" in order to figure out that stack is a stack of vertices? I doubt it. The redundancy makes the code harder to read, not easier, so eliminate it. For more thoughts on this subject see http://blogs.msdn.com/b/ericlippert/archive/2011/04/20/uses-and-misuses-of-implicit-typing.aspx – Eric Lippert Apr 28 '11 at 14:53
  • 1
    For more information about your question on how to represent graphs, see http://en.wikipedia.org/wiki/Graph_(data_structure) -- you are suggesting the "adjacency list" approach which is good for sparse graphs. The "adjacency matrix" approach is more efficient for dense graphs. – Eric Lippert Apr 28 '11 at 14:56
  • @EricLippert, Doesn't this contain a bug where a Vertex can be visited more than once since the visited set is only checked when adding to the stack (but not checked at the time of visiting), so you could get two duplicate Vertex objects on the stack at once. Example A->B, A->C, B->C. In this example C can get visited twice. – Matt Smith Sep 11 '13 at 14:07
  • @vulcanraven. Bug is gone, but now when it sees a visited node it has to re-go through all the neighbors (deeply) and find that they are already visited. In my implementation, I just check the return value of `Add` and if it is not new, continue. I.e. `if (!visited.Add(current)) continue; yield return current;` – Matt Smith Jan 20 '14 at 14:21
  • Can someone explain what this line means? `var neighbours = graph.GetNeighbours(current) .Where(n=>!visited.Contains(n));` Particularly `.Where(n=>!visited.Contains(n))`. – Atlas2k Dec 22 '17 at 04:07
  • 2
    @Atlas2k: Can you ask a more specific question? The line of code means "get the neighbours I haven't visited yet". We represent this by having a set of visited nodes, and we're getting all the neighbours that are not yet in the set of visited nodes. Are you asking why we don't want to visit the nodes we've already visited? Because the problem is "visit each node in depth first order". – Eric Lippert Dec 22 '17 at 15:14
  • My question is even more basic than that. I don't know what 'n=>' represents, and it was difficult to search for this set of characters in Google. Thanks for the explanation. – Atlas2k Dec 22 '17 at 19:00
  • 2
    @Atlas2k: The keywords you want to search for are "lambda expression" and "LINQ". – Eric Lippert Dec 23 '17 at 14:47
  • @EricLippert The last code snippet looks a lot like a Breadth-First Traversal rather than a Depth-First Traversal. Isn't it a BF traversal? – Ziezi Jun 02 '18 at 14:00
  • 2
    @ziezi your question is easy to answer yourself. Create a graph that is in the form of a tree and run the algorithm. What order did it traverse the nodes in? – Eric Lippert Jun 03 '18 at 02:45
6

I generalized @Eric's code for DFS traversal for any T to make things work for any type that has children - I thought I'd share:

public static IEnumerable<T> DepthFirstTraversal<T>(
    T start,
    Func<T, IEnumerable<T>> getNeighbours)
{
    var visited = new HashSet<T>();
    var stack = new Stack<T>();
    stack.Push(start);

    while (stack.Count != 0)
    {
        var current = stack.Pop();

        if (!visited.Add(current))
            continue;

        yield return current;

        var neighbours = getNeighbours(current).Where(node => !visited.Contains(node));

        // If you don't care about the left-to-right order, remove the Reverse
        foreach(var neighbour in neighbours.Reverse())
        {
            stack.Push(neighbour);
        }
    }
}

Example usage:

var nodes = DepthFirstTraversal(myNode, n => n.Neighbours);
sapbucket
  • 6,795
  • 15
  • 57
  • 94
Adi Lester
  • 24,731
  • 12
  • 95
  • 110
1

The problem is in the order you search the elements. Your for each in StartSearch does not guarantee element order. Neither does you FindAll in GetConnectedVertices method. Let's look at this line:

edges.FindAll(edge => edge.VertexSource == vertex && edge.VertexTarget.State == State.Unvisited).ForEach(edge => vertices.Add(edge.VertexTarget));

You should add an OrderBy() to ensure the desired order.

Adriano Carneiro
  • 57,693
  • 12
  • 90
  • 123
  • Reversing the order of the list (Using Reverse()) seems to solve the problem. Thank you for the answer, got me in the right direction. – Dumpen Apr 27 '11 at 15:11
  • 1
    @Dumpen: You can do it more efficiently than using Reverse. Build another stack! That way it is automatically enumerated "backwards". – Eric Lippert Apr 27 '11 at 15:17
  • Ohh thats cleaver. You mean like this? ` edges.FindAll(edge => edge.VertexSource == vertex && edge.VertexTarget.State == State.Unvisited). ForEach(edge => connectingVertices.Push(edge.VertexTarget)); ` (Sorry I seem to fail at making code formatting) – Dumpen Apr 27 '11 at 15:31
0

This is my implemenation, one stack is good enough. A reverse is done before the foreach loop.

    /// <summary>
    /// Depth first search implementation in c#
    /// </summary>
    /// <typeparam name="T">Type of tree structure item</typeparam>
    /// <typeparam name="TChilds">Type of childs collection</typeparam>
    /// <param name="node">Starting node to search</param>
    /// <param name="ChildsProperty">Property to return child node</param>
    /// <param name="Match">Predicate for matching</param>
    /// <returns>The instance of matched result, null if not found</returns>
    public static T DepthFirstSearch<T, TChilds>(this T node, Func<T, TChilds> ChildsProperty, Predicate<T> Match) 
        where T:class
    {
        if (!(ChildsProperty(node) is IEnumerable<T>))
            throw new ArgumentException("ChildsProperty must be IEnumerable<T>");

        Stack<T> stack = new Stack<T>();
        stack.Push(node);
        while (stack.Count > 0) {
            T thisNode = stack.Pop();
            #if DEBUG
            System.Diagnostics.Debug.WriteLine(thisNode.ToString());
            #endif
            if (Match(thisNode))
                return thisNode;
            if (ChildsProperty(thisNode) != null) {
                foreach (T child in (ChildsProperty(thisNode) as IEnumerable<T>).Reverse()) 
                    stack.Push(child);
            }
        }
        return null;
    }
Lepton
  • 37
  • 1
  • 4
0

Items will be popped of the stack in the reverse order as how they are pushed on it:

stach.push() results in: 1 2 3 4 5

stack.pop() results in: 5 4 3 2 1 (so: from right to left)

Flawless
  • 127
  • 7
0

You might enjoy this:

        public static bool DepthFirstSearch<T>(this IEnumerable<T> vertices, T rootVertex, T targetVertex, Func<T, IEnumerable<T>> getConnectedVertices, Func<T, T, bool> matchFunction = null)
    {
        if (getConnectedVertices == null)
        {
            throw new ArgumentNullException("getConnectedVertices");
        }
        if (matchFunction == null)
        {
            matchFunction = (t, u) => object.Equals(t, u);
        }
        var directlyConnectedVertices = getConnectedVertices(rootVertex);
        foreach (var vertex in directlyConnectedVertices)
        {
            if (matchFunction(vertex, targetVertex))
            {
                return true;
            }
            else if (vertices.DepthFirstSearch(vertex, targetVertex, getConnectedVertices, matchFunction))
            {
                return true;
            }
        }
        return false;
    }
Evan Machusak
  • 694
  • 4
  • 9
  • Thank you. I will certainly look into this, however at the moment I seem to have enough in my head with the current script :) – Dumpen Apr 27 '11 at 15:11
  • recursion isn't always a good idea, I'd be careful using this solution. – vvondra Nov 09 '11 at 13:13