2

Whenever I run the tarjans algorithm on any graph it always claims to have a cycle, for example this graph:

A -> B -> C

The algorithm will tell me there is a cycle:

[a]
[b]

When there is a cycle, for example:

A -> B -> C -> A

The output is quite strange:

[c, b, a]
[a]
[b]

Here's my implementation:

import java.util.ArrayDeque;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import java.util.HashSet;
import java.util.stream.Collectors;

public class Tarjans {

    private static class Node {
        public int index = -1, lowLink = -1;
        public String name;

        public Node(String name) {
            this.name = name;
        }

        public String toString() {
            return name;
        }
    }

    HashMap<String, Node> nodes = new HashMap<>();
    HashMap<String, ArrayList<Node>> graph = new HashMap<>();

    private int index = 0;
    private ArrayDeque<Node> visited = new ArrayDeque<>();
    private HashSet<String> stack = new HashSet<>();

    public ArrayList<ArrayList<Node>> tarjan() {
        ArrayList<ArrayList<Node>> cycles = new ArrayList<>();
        for (String key : graph.keySet()) {
            Node n = nodes.get(key);
            if (n == null) {
                System.err.println("what is " + n + "?");
                return new ArrayList<ArrayList<Node>>();
            }

            ArrayList<Node> cycle = strongConnect(n);
            if (cycle.size() > 0) {
                cycles.add(cycle);
            }
        }
        return cycles;
    }

    private ArrayList<Node> strongConnect(Node node) {
        node.index = index;
        node.lowLink = index;
        index += 1;

        visited.push(node);
        stack.add(node.name);

        ArrayList<Node> neighbours = graph.get(node.name);
        if (neighbours == null) return new ArrayList<>();

        neighbours.forEach(n -> {
            if (n.index == -1) {
                strongConnect(n);
                node.lowLink = Math.min(node.lowLink, n.lowLink);
            }
            else if (stack.contains(n.name)) {
                node.lowLink = Math.min(node.lowLink, n.index);
            }
        });

        ArrayList<Node> cycle = new ArrayList<>();
        if (node.lowLink == node.index) {
            Node p = null;
            do {
                p = visited.pop();
                stack.remove(p.name);
                cycle.add(p);
            } while (p != node);
        }
        return cycle;
    }

    private void foo() {
        nodes.put("a", new Node("a"));
        nodes.put("b", new Node("b"));
        nodes.put("c", new Node("c"));

        // A -> B -> C -> A
        graph.put("a", new ArrayList<>(Arrays.asList(nodes.get("b"))));
        graph.put("b", new ArrayList<>(Arrays.asList(nodes.get("c"))));
        graph.put("c", new ArrayList<>(Arrays.asList(nodes.get("a"))));

        ArrayList<ArrayList<Node>> cycles = tarjan();
        for (ArrayList<Node> cycle : cycles) {
            System.out.println("[" + cycle.stream().map(Node::toString).collect(Collectors.joining(",")) + "]");
        }
    }

    public static void main(String[] args) {
        new Tarjans().foo();
    }

}

But I'm not sure where I'm going wrong. I've followed the wikipedia article on tarjans algorithm nearly 1:1 and the psuedocode. I'm very new to graph theory and graph algorithms, so I can't wrap my head around what is the mistake here.

fix for tarjan()

public ArrayList<ArrayList<Node>> tarjan() {
    ArrayList<ArrayList<Node>> cycles = new ArrayList<>();
    for (Node n : nodes.values()) {
        if (n == null) {
            System.err.println("what is " + n + "?");
            return new ArrayList<ArrayList<Node>>();
        }

        if (n.index == -1) {
            ArrayList<Node> cycle = strongConnect(n);
            if (cycle.size() > 0) {
                cycles.add(cycle);
            }   
        }
    }
    return cycles;
}
flooblebit
  • 477
  • 1
  • 3
  • 9
  • (Wikipedia keeps multiple algorithms under Tarjan's name, the [Strongly Connected Components](https://en.wikipedia.org/wiki/Tarjan%27s_strongly_connected_components_algorithm) algorithm not being the "default" one.) As presented in revision 1, the code contains syntax errors (and lacks `Node`). After a bit of clean-up, I don't get _one_ cycle (a, b), but two: ((a), (b)) - _if_ I don't put(c, emptyList()) into the graph. If I put both (a, b) and (b, a) into the graph, I _do_ get ((a, b), …), with "…" messy (handling of 'index' in later/recursive calls?). – greybeard Nov 21 '16 at 15:50
  • @greybeard I'm confused, what do you mean? – flooblebit Nov 21 '16 at 16:00
  • Uh the entire thing. What are you trying to say? Did you clean up the code? Was it the code in the revision on wikipedia? The last bit is kind of a question(?), but I don't understand what you're asking. – flooblebit Nov 21 '16 at 16:15
  • I don't understand how you handle it via. keyset. What do you mean here, can you elaborate please? Also, what do you mean by "(out for hours)"? – flooblebit Nov 21 '16 at 16:49
  • Method `tarjan()` still looks off: look at the end of the line starting `for (`, and at the end of the aligned line starting with a '}' (try copying the code from this post & inserting it in your development environment). With A, B, C entered as a circle, I get `[[Node, Node, Node], [Node], [Node]]`: it may be about time you started to include all of the code (&input) to reproduce your findings - a [Minimal, Complete, and Verifiable Example](http://stackoverflow.com/help/mcve). – greybeard Nov 21 '16 at 22:48
  • Okay @greybeard, I've added an example which is easier to run and I've fixed the outputs to be more accurate. – flooblebit Nov 21 '16 at 23:37
  • Most probably, you're thrown off by your own choice naming the nodes in a _strongly connected component_ just _cycle_. Then, there is one point where you are _not_ following the algorithm: wikipedia presents `for each v in V do if (v.index is undefined) then strongconnect(v)`, you are calling `strongConnect(n)` for each and every `Node n` you find looking up `key`s from `graph.keySet()` in `nodes` - without regard for `n.index.` – greybeard Nov 21 '16 at 23:58
  • @greybeard I think I see what you mean. The `for each (v, w) in E do` part specifically? I wasn't sure how to interpret that, I presume it meant all of the neighbours of the node. In my case with the graph as a HashMap, how would I do this properly? – flooblebit Nov 22 '16 at 00:03
  • @greybeard D'oh, I can't believe I missed that! Okay, so with a cycle A -> B -> C -> A, I get [c, b, a] which is what I'm after. With no cycle, i.e. A -> B -> C, I get [a]. Is this how it's supposed to be? I've updated my question to have the new tarjan() method. – flooblebit Nov 22 '16 at 00:21
  • `With no cycle, i.e. A -> B -> C, I get [a]. Is this how it's supposed to be?` _there's_ a good question - my answer 9½ hours after sunset: That's what I get from this implementation I can't any longer find a notable deviation from the blueprint with, and I don't see any more claim for _a_ to be listed in a component all by its own than for _b_ or _c_, which aren't. Rather than finding "the original" (presentation of Tarjan's), I'm going to chase some sleep. – greybeard Nov 22 '16 at 01:03
  • @greybeard Okay! Thank you for all your help :-) – flooblebit Nov 22 '16 at 01:11
  • See also [Understanding Tarjan's SCC algorithm](http://stackoverflow.com/a/11016584/3789665) – greybeard Nov 22 '16 at 11:26

1 Answers1

1

From the first revision of the code presented in the question, the problems boil down to nearly not quite being near enough: I've followed the wikipedia article on [Tarjan's Strongly Connected Components] algorithm nearly 1:1 and the pseudocode.
(And possibly naming (variables to hold) strongly connected component cycle: if edges (a, b), (a, c), (b, a), (b, c) and (c, a) belong to one graph, vertices/nodes a, b, and c are in one strongly connected component which is neither a cycle nor cycles that happen to (pairwise) share vertices.)

There has been calling strongConnect() for nodes already visited - fixed in revison 7.
As of revison 7, there still is not checking a node for qualifying as a strongly connected component whenever it has no neighbours/successors.
Handling a strongly connected component once found is not as easy as it could be: have a Set<Set<Node>> as a data member "of the algorithm(instance)" to just add it to.

Once you got your implementation working and the code cleaned up and commented, I suggest presenting it at CODE REVIEW: there are lots of opportunities to make everyone's life (as a (Java) coder) easier, starting with yours.

greybeard
  • 2,249
  • 8
  • 30
  • 66
  • All beginnings are difficult. – greybeard Nov 22 '16 at 10:18
  • I'm not sure what you mean by "have a Set> as a data member". What do I add to this? What kind of set, a hash set? Thanks for the CODE REVIEW idea, I never knew this little sub-community existed! :-) – flooblebit Nov 22 '16 at 10:54
  • `mean by "have a Set> as a data member`: `class Tarjan { …Set> scComponents = new HashSet<>();…}` `What do I add to this?`: `scComponents.add(stronglyConnectedComponent)` `What kind of set, a hash set?` It _should_ not matter, functionally: if you _want_ to do without (the possibility to) check for duplicates, a `Collection` would do. – greybeard Nov 22 '16 at 11:04
  • I don't see how this is different to adding them to an arraylist and returning them from the tarjan() method – flooblebit Nov 22 '16 at 11:07
  • Try returning _multiple_ connected components, from recursive calls to `strongConnect()`. – greybeard Nov 22 '16 at 11:09
  • Am I on the right track [here](http://pastie.org/pastes/10967454/text?key=wi5hyblwvtsdv5qxhddzwa)? – flooblebit Nov 22 '16 at 11:21
  • `on the right track [at pastie.org]?` yes, from glossing over. Don't write, never present uncommented code. – greybeard Nov 22 '16 at 11:26
  • Hmm... okay. It seems to sort of work, but I'm still getting lists that have only a single node in them. A -> B -> C -> A gives me [c, b, a] which is perfect, A -> B -> C gives me [a], [c], [b]...? This is with literally the same code in my pastie. – flooblebit Nov 22 '16 at 11:32
  • `A -> B -> C gives me [a], [c], [b]` - which _exactly_ lists the SCCs of that graph. Did I mention you might throw yourself off calling an SCC a cycle? – greybeard Nov 22 '16 at 11:41
  • Hmmm. I figured that might be the case. So I could handle that by ignoring any hash-set with just a single node? – flooblebit Nov 22 '16 at 11:47
  • It works! I can't thank you enough for your help, I would award you an infinite amount of internet points if I could :-) – flooblebit Nov 22 '16 at 11:50
  • (Don't underestimate voting, accepting an answer (and commenting): transfixed by bounties of 50, even 100 reputation points? That's 2 or 4 accepted answers, with as many up-votes - there may be _many_ up-votes for good answers, especially to good questions - _if_ the bounty _is_ awarded manually. If not, it gets halved…) – greybeard Nov 26 '16 at 11:49