1

What algorithm is used to find the longest path thru a directed cyclic unweighted graph. Each node points to only one other node. The graphs have up to 10^9 nodes. (I searched here and google to no avail.)

Jim Lewis
  • 349
  • 2
  • 14
  • Each node points only to one node? That would make this a series of nodes, no a graph, and thus the question of the longest path would seem irrelevant... – Lucero Dec 12 '16 at 09:49
  • Possible duplicate of [Longest acyclic path in a directed unweighted graph](http://stackoverflow.com/questions/2525316/longest-acyclic-path-in-a-directed-unweighted-graph) – Lucero Dec 12 '16 at 09:50
  • The nodes can be thought of as falling on circles of varying sizes and I need the largest circle. – Jim Lewis Dec 12 '16 at 10:07
  • Ok, so you don't have one single graph but rather a series of distinct graphs which each has the form a closed chain with various number of nodes, right? – Lucero Dec 12 '16 at 10:13
  • Yes, that is correct. – Jim Lewis Dec 12 '16 at 10:36
  • Since you've mentioned that the graph van have up to 10^9 nodes, does that imply that you have some memory limitations or `O(n)` complexity (both time and memory) is fine? – Nikola Stojiljkovic Dec 12 '16 at 10:53
  • @lucero "closed chain with various number of nodes" the possible configuration that I see are: straight rope, closed looped rope, lasso rope (the end reconnects to the middle of the rope. Thing the grapheme for 6 or 9) – Adrian Colomitchi Dec 12 '16 at 14:00
  • O(n) would be great and I can afford O(n) memory. But see my comment below withdrawing my circles claim. – Jim Lewis Dec 12 '16 at 21:49
  • @JimLewis Can you read Java or C/C++? If positive, which one you prefer? – Adrian Colomitchi Dec 12 '16 at 22:20
  • I know C++ (currently using Python FWIW). – Jim Lewis Dec 12 '16 at 23:19
  • @JimLewis By the way, there's another topology possible: "ring with string attached" - where the string are "incoming connection" – Adrian Colomitchi Dec 13 '16 at 00:14
  • @JimLewis Example of ["ring with string attached"](http://stackoverflow.com/questions/41120347/trim-the-octopus-remove-all-branches-of-a-digraph-not-part-of-a-cycle-in-on). What should the algo output from something like this? – Adrian Colomitchi Dec 13 '16 at 11:55

3 Answers3

0

So you don't have one single graph but rather a series of distinct graphs which each form a closed chain with various number of nodes.

If that is the case, you can implement an algorithm which roughly uses O(n) time complexity and O(n) space complexity (assuming random access to all nodes and that each node has an ascending ID).

Start at the first node, and traverse the chain until you are back on the first node. For each visited node, mark it with a chain identifier. Store the number of visited nodes for this chain ID. Then you go to the next node (by ID, not in the chain), check if it has already been marked as being part of a chain. If yes, move on; if not, process the chain. Do that until you're on the last node ID. At this point you're done and you know the length of all chains, and then you pick the longest one.

Lucero
  • 59,176
  • 9
  • 122
  • 152
  • "Start at the first node, and traverse the chain until you are back on the first node." Fails for "straight rope" (if the interpretation is: "node points to at most another node" if not there's no straight rope) but you missed the "lasso configuration" too (the end reconnects alright, not with the start but with some other node in the middle). E.g. 1->2->3->4->2 – Adrian Colomitchi Dec 12 '16 at 14:05
  • @AdrianColomitchi I wrote my answer after the OP clarified the data: "The nodes can be thought of as falling on circles of varying sizes and I need the largest circle" - thus no "straight rope" or "lasso" configuration. – Lucero Dec 12 '16 at 16:04
  • Very sorry for my error but from Lucero's reference to "lasso", although I was correct in stating that "each node points to only one other node" I now realize that a node can have multiple in-pointing nodes and hence I was incorrect in visualizing as multiple circles. – Jim Lewis Dec 12 '16 at 21:17
  • @jimlewis And what are you interested in, longest lasso or largest circle? Note that the algorithm I sketched also works for the lasso case with a minor adjustment (when processing the chain, if a node visited is already marked as part of another chain, mark the two parts as being the same chain, and move on). – Lucero Dec 13 '16 at 01:02
  • Longest lasso. OK, I think I have enough to proceed. Tnx! – Jim Lewis Dec 13 '16 at 08:31
0

First recursively remove every vertex of in-degree zero (in O(n)). The resulting graph is just a disjoint union of cycles.

Take arbitrary node, run dfs, and find the length of the cycle it belongs to (just by visiting neighbour, a natural dfs). Continue this for every unvisited node. At the end you can output the largest cycle.

Saeed Amiri
  • 22,252
  • 5
  • 45
  • 83
  • "Your graph is just a disjoint union of cycles." Wrong. Consider 1->2->3->4->2. – Adrian Colomitchi Dec 12 '16 at 14:06
  • Yes, but that was easy to fix. See the updated answer. – Saeed Amiri Dec 12 '16 at 19:46
  • "First recursively remove every vertex of in-degree zero (in O(n))." Is it O(N) really? Can you please exemplify with 1->2->....->998->999->998 assuming that you picked 999 as the start node to look for predecessors? – Adrian Colomitchi Dec 12 '16 at 22:14
  • @AdrianColomitchi, you need a basic course on algorithms. Is 999 a vertex with in degree zero? – Saeed Amiri Dec 13 '16 at 01:09
  • "Is 999 a vertex with in degree zero?" Do you have an algo to pick a node with 0 in-degree in **O(1)** time when your nodes only store a ref to the successor? I might start my basic course on algorithms by studying that marvel that you magically popped from the hat. – Adrian Colomitchi Dec 13 '16 at 01:25
  • Here's another example for you to start explaining the magic algo. Zero-based indexing, an array containing the 'next index' of the current node: [1,0,1,2,3,.., 998]. You have a two element loop on the 0 and 1, then 998 elements pointing to their predecessor as their next node. Show me how you get to the stripped down cycle in O(N) without magically picking the last element as "Yeah, well... I have a hunch, I'll start from the end this time". – Adrian Colomitchi Dec 13 '16 at 01:40
  • @Adrian, did you understand the first phase of the algorithm? Can you find vertices of in degree zero? Can you remove them recursively and proceed further? Did you ever passed a course on elementary algorithms? In which university or colleague? I should write an email to professors who allow you to pass the course. – Saeed Amiri Dec 13 '16 at 02:08
  • I just noticed that you are not from computer science community. Sorry if I bothered you, but these are really basic. – Saeed Amiri Dec 13 '16 at 02:17
  • "Answer my question, did you ever passed a course an basic algorithms" No. "... or even did you ever read a book about basic algorithms and data structures?" Yes. "I just noticed that you are not from computer science community." You are correct. "Do you know what is adjacency list?" An adjacency matrix in sparse representation (discarding non-connected nodes). – Adrian Colomitchi Dec 13 '16 at 02:46
  • Would you be so kind and actually implement something for the answer? Just to actually see the complexity without relying on "magister dixit" - proof in the pudding, so to say. Feeling of (non-CS) guts, your "recursive 0 in-degree striping" will get you to the L^2, where L is the length of non-looped elements. – Adrian Colomitchi Dec 13 '16 at 02:56
  • I still think that eliminating all the string that lead to the loop (can be many) is going to cost you L^2 no matter what structures you'll be using. If you think otherwise, so be it. – Adrian Colomitchi Dec 13 '16 at 09:30
  • No deal, mate, on the betting side. – Adrian Colomitchi Dec 13 '16 at 09:30
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/130478/discussion-between-saeed-amiri-and-adrian-colomitchi). – Saeed Amiri Dec 13 '16 at 09:44
  • "or provide a concrete example such that I see what you mean to be able to explain why you are wrong." Ok Saeed, I'll prepare an example for the "First recursively remove every vertex of in-degree zero **(in O(n))**." subproblem and post it as a question. Then I'll comment on this again with the reference to the question. This will give us more space than the lousy comments. Good enough for you? – Adrian Colomitchi Dec 13 '16 at 09:49
  • [For your pleasure](http://stackoverflow.com/questions/41120347/trim-the-octopus), magister. I linked to the example you asked for. – Adrian Colomitchi Dec 13 '16 at 11:48
  • @AdrianColomitchi, here is the implementation for what I said (in the first part of the algorithm), I didn't want to make it publicly available as it was really foolish question. But as you didn't accept the answer to your question in other post I leave it here. I'm sorry for myself to talk too low level about algorithms. Implement it to see if it is correct! Funny! Low level world. http://ideone.com/ZDj7k6 – Saeed Amiri Dec 14 '16 at 11:34
  • Yeap, it's valid and pretty much clear. The only beef I have with it: you mentioned a recursive algo and you presented an iterative one. It's also a valid answer to the question I raised. "I leave it here" - tell you what: there will be many non-CS graduates having the same problem and searching for an answer - for their sake, do publish this as an answer to my question; low level or not, the world will be a slightly better place. "But as you didn't accept the answer to your question in other post" - that's because I want more answers, and I intend to honour my promise to put a bounty on it. – Adrian Colomitchi Dec 15 '16 at 01:39
  • @AdrianColomitchi, Recursive does not mean always to call a function recursively or use stack directly etc. e.g. here in that code the while loop: while (current != null && current.indegree == 0) is actually the recursive call. – Saeed Amiri Dec 15 '16 at 17:49
  • Also I cannot sketch a border line to separate recursive and iterative, but the spirit of the algorithm in my opinion is recursive (or recursive induction), while someone else may believe that it is not like that. – Saeed Amiri Dec 15 '16 at 18:02
  • Bounty [opened](http://stackoverflow.com/questions/41120347/trim-the-octopus-remove-all-branches-of-a-digraph-not-part-of-a-cycle-in-on). Please do post your solution. – Adrian Colomitchi Dec 15 '16 at 22:24
0

See the tortoise and hare loop detection - you send one iterator with step increment of 1 (tortoise) and another one with step increment of two (hare). If the list has a cycle, they are bound to meet. (also in another question on SO).

  1. Take a node
  2. start the "tortoise and hare", marking the nodes visited by the tortoise (O(N) space complexity, array of bools suffice)
  3. once the tortoise meets the hare, once the hare steps onto a node already visited by the tortoise (i.e. both are inside a loop), stop the hare, bring the tortoise at the same position as the hare and let the tortoise go around the loop once more and count the length of the loop (to check against the maximum so far)

  4. take another not yet visited node and go to step 2


C++ solution

size_t maxLoopLen(const std::vector<size_t>& nextNodeIndexes) {
  size_t len=nextNodeIndexes.size();
  std::vector<bool> visitTrace(len, false);
  size_t ret=0; // the max number of elements in the loop
  for(;;) {
    // find the first non-visited node
    size_t pos=0;
    for(pos=0; pos<len && visitTrace[pos]; pos++);
    if(pos>=len) { // no more unvisited nodes
      break;
    }

    // this is needed for the "ring with string attached" topology
    // The global visitTrace contains the exploration of the prev
    // loos or **string leading to the same loop** - if the hare
    // steps on one of those prev strings, it may stop prematurely
    // (on the string, not inside the loop)
    std::vector<bool> currCycleTrace(len, false);

    size_t hare=pos, tortoise=pos;
    bool hareOnKnownPosition=false;
    while ( !currCycleTrace[hare] && !hareOnKnownPosition) {
      if(visitTrace[hare]) {
        // the hare just got to revisit something visited on prev cycles
        // *** ***********************************************************
        // *** this is where the algo achieves sub-O(N^2) time complexity
        // *** ***********************************************************
        hareOnKnownPosition=true;
        break;
      }
      // mark the tortoise pos as visited
      visitTrace[tortoise]=currCycleTrace[tortoise]=true;
      // tortoise steps with increment of one
      tortoise=nextNodeIndexes[tortoise];
      // hare steps two
      hare=nextNodeIndexes[hare];
      hare=nextNodeIndexes[hare];
    }
    // we got out of that cycle because the hare stepped on either:
    // - a tortoise-visited place on the current cycle - in this case
    //   both the tortoise and the hare are inside a not-yet-explored
    //   loop.
    // - on a place where the tortoise has been when it discovered a
    //   loop at prev cycles (think "ring with multiple string attached"

    if(!hareOnKnownPosition) {
      // The hare stepped on a new loop, not a loop visited before

      // Bring the tortoise to the same position as the hare. keep the
      // hare still and start counting how many steps until the tortoise
      // gets back to the same place
      tortoise=hare;
      size_t currLoopElemCount=0;
      do {
        tortoise=nextNodeIndexes[tortoise];
        currLoopElemCount++;
      } while(tortoise!=hare);
      ret=std::max(currLoopElemCount, ret);
    }
  }
  return ret;
}

#include <iostream>
int main() {
  std::vector<size_t> lasso={3,3,1,2,0};
  // expected 3, with cycle at nodes at indexes 1,2,3
  std::cout << "lasso max loop len " << maxLoopLen(lasso) << std::endl;

  // expected 2. The ring index 1 and 2. Two connected strings
  // - starting at index 0 - 0->3->2 and we are inside the ring
  // - starting at index 4 - 4->1  and we are inside the ring
  std::vector<size_t> ringWith2Strings={3,2,1,2,1};
  std::cout << "ringWith2Strings max loop len " 
            << maxLoopLen(ringWith2Strings) << std::endl;

  std::vector<size_t> singleElem={0};
  std::cout << "singleElem max loop len " << maxLoopLen(singleElem) << std::endl;

  std::vector<size_t> allTogether={
    3,3,1,2,0, // lasso
    8,7,6,7,6, // ringWith2Strings shifted up 5 pos
    10 // single element pointing to itself
  };
  std::cout << "allTogether max loop len " << maxLoopLen(allTogether) << std::endl;
}

Example of exploring the "nodes"

lasso={3,3,1,2,0};
  • node 4 says "go to node 0"
  • node 0 says "go to node 3"
  • node 3 says "go to node 2"
  • node 2 says "go to node 1"
  • node 1 says "go to node 3" (loop)
Community
  • 1
  • 1
Adrian Colomitchi
  • 3,974
  • 1
  • 14
  • 23
  • 2
    I don't see why one should use tortoise and hare algorithm while there is a quite simple algorithm for this problem. Also it is irrelevant as we don't want to just find a cycle. – Saeed Amiri Dec 12 '16 at 19:45
  • @SaeedAmiri " as we don't want to just find a cycle." And where does it skip finding the other things? (did you miss step 3? I just emphasized it) – Adrian Colomitchi Dec 12 '16 at 21:53
  • First of all there is a simple, fast, straightforward algorithm that one don't need to analyse it like tortoise and hare algorithm, second, still it is not clear for me if it works correctly. The question still remains, if you use tortoise and hare algo to find cycles why you don't use normal dfs?! – Saeed Amiri Dec 13 '16 at 01:16
  • @SaeedAmiri "If you use tortoise and hare algo to find cycles why you don't use normal dfs?!" As implemented above, the method * **is DFS** with a second pointer (the hare)* (the tortoise does DFS). The use of this extra pointer make easier get a position where to start counting the nodes in the loop - once a new loop is detected. – Adrian Colomitchi Dec 13 '16 at 02:28