3

Preliminaries - safe to skip: This question is in relation with Longest Path for Directed Cyclic Graph. Now, there was a bit of a controversy in there, about how trivial or not trivial would be to isolate only the cycles in a directed graph, iterative or recursive, etc, so I decided to open this as a question - I guess there will be other non-CS graduates (like me) who may benefit in the futures from answers that are clear and well explained.

The question is now "academic" so to speak - the purpose is to get an as large as possible set of answers :

Given a totally connected graph obeying the "exactly one child node for every node" rule (therefore exactly one cycle must exist) remove all nodes not part of the cycle in O(N).

The algo should minimally answer to the question of "what is the length of the cycle", the identity/indexes of the node a bonus. Considerations about complexity (why is O(N)) will be welcomed.

A bonus if the algo can deal with non-totally connected graphs and identify all the cycles (not only the nodes belonging to any cycle, but also separating/tagging them with a cycle identifier).

The more solutions which are clear and well explained, the better for future answer seekers

This is why I'm putting on a bounty (in SO gold) and relaxing the language requirements.

An example in the image below

Cyclic graph

The nodes are connected from low-to-high along the lines.

If I'm not mistaken, an adjacency vector (index designates the node, value designates the node.next) would read

   [ 1,  2,  3,  4,  5,  6,  7,  8,  9, 80,
     11, 12, 13, 14, 15, 16, 17, 18, 19, 81,
     21, 22, 23, 24, 25, 26, 27, 28, 29, 82,
     31, 32, 33, 34, 35, 36, 37, 38, 39, 83,
     41, 42, 43, 44, 45, 46, 47, 48, 49, 84,
     51, 52, 53, 54, 55, 56, 57, 58, 59, 85,
     61, 62, 63, 64, 65, 66, 67, 68, 69, 86,
     71, 72, 73, 74, 75, 76, 77, 78, 79, 87,
     81, 82, 83, 84, 85, 86, 87, 80 ]

Other test data:

  • the singularity - [0] a single node "curled" onto itself (like the extra dimension in "String theory"?)
  • active blackhole - [1,2,2,2] - a singularity at index 2 in which both of the left and right falls
  • inverted lasso - [1,0,1,2,3] - start with the loop and has the tail leading towards it
Community
  • 1
  • 1
Adrian Colomitchi
  • 3,974
  • 1
  • 14
  • 23
  • @AdrianColomitchi The sample data could be improved by adding some nodes which are part of the cycle, but which have no other nodes pointing to them (e.g. a 88 or so node). – Lucero Dec 13 '16 at 13:16
  • @Lucero, there is no problem with the second answer, except that it only implicitly answers the question, you should explicitly write that what I said is actually correct (or what is asked in the question is correct), and then provide a code or prove it. After that I can remove my downvote – Saeed Amiri Dec 13 '16 at 13:17
  • @Lucero "The sample data could be improved by... [etc]" you mean nodes with no strings attached? Why would this make the problem harder? – Adrian Colomitchi Dec 13 '16 at 13:20
  • @AdrianColomitchi Somewhat, yes. With the current data, the 1st pass alone of my solution returns the correct answer, but if you add a node which has no string attached the 2nd pass is needed. BTW the counting can be done in the 2nd pass as well. ;) – Lucero Dec 13 '16 at 13:28

4 Answers4

3

My answer was accepted in the original question, so here's my solution to the original question as JS snippet. Runtime O(n), Space O(n).

var data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 80,
  11, 12, 13, 14, 15, 16, 17, 18, 19, 81,
  21, 22, 23, 24, 25, 26, 27, 28, 29, 82,
  31, 32, 33, 34, 35, 36, 37, 38, 39, 83,
  41, 42, 43, 44, 45, 46, 47, 48, 49, 84,
  51, 52, 53, 54, 55, 56, 57, 58, 57, 85,
  61, 62, 63, 64, 65, 66, 67, 68, 69, 86,
  71, 72, 73, 74, 75, 76, 77, 78, 79, 87,
  81, 82, 83, 84, 85, 86, 87, 80
];

var visited = new Array(data.length);

var chains = [];

for (var i = 0; i < data.length; i++) {
  if (!visited[i]) {
    var current = {
      chain: chains.length
    }
    var j = i;
    var len = 0;
    do {
      len++;
      visited[j] = current;
      j = data[j];
    } while (!visited[j]);
    if (visited[j].chain === current.chain) {
      chains.push(len);
    } else {
      current.chain = visited[j].chain;
      chains[current.chain] += len;
    }
  }
}

//the visited array now contains information about which chain each node belongs to
document.write("<table><tr>");
for (var i = 0; i < visited.length; i++) {
  document.write("<td>"+i + "=>" + visited[i].chain + "</td>");
  if (i % 10 == 9) {
    document.write("<tr/><tr>");
  }
}
document.write("</table></tr>");
document.write("<p>Chain lengths: " + chains+"</p>");

Note that the dataset posted did have an "error" in the octopus; 57 => 58 => 57 makes this a "lasso" which is separate from the "octopus".

And here's the solution to the question asked here:

Runtime O(n), Space O(n), result in array "visited" true part of cycle, false not part of cycle.

var data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 80,
  11, 12, 13, 14, 15, 16, 17, 18, 19, 81,
  21, 22, 23, 24, 25, 26, 27, 28, 29, 82,
  31, 32, 33, 34, 35, 36, 37, 38, 39, 83,
  41, 42, 43, 44, 45, 46, 47, 48, 49, 84,
  51, 52, 53, 54, 55, 56, 57, 58, 59, 85,
  61, 62, 63, 64, 65, 66, 67, 68, 69, 86,
  71, 72, 73, 74, 75, 76, 77, 78, 79, 87,
  81, 82, 83, 84, 85, 86, 87, 80
];

var visited = new Array(data.length);

// Pass 1: mark
for (var i = 0; i < data.length; i++) {
  if (visited[i] == null) {
    var k;
    var j = i;
    do {
      visited[j] = false;
      k = j;
      j = data[j];
    } while (visited[j] == null);
    visited[k] = i == 0;
  }
}
// Pass 2: propagate
var cnt = 0;
for (var i = 0; i < visited.length; i++) {
  if (visited[i]) {
    var j = i;
    do {
      cnt ++;
      visited[j] = true;
      j = data[j];
    } while (j != i);
    break;
  }
}

//the visited array now contains information about nodes belonging to loop
document.write("Count: "+cnt);
document.write("<table><tr>");
for (var i = 0; i < visited.length; i++) {
  document.write("<td>" + i + "=>" + visited[i] + "</td>");
  if (i % 10 == 9) {
    document.write("<tr/><tr>");
  }
}

Edit: If there is always exactly one cycle in the data structure, the process can be simplified further:

var data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 80,
  11, 12, 13, 14, 15, 16, 17, 18, 19, 81,
  21, 22, 23, 24, 25, 26, 27, 28, 29, 82,
  31, 32, 33, 34, 35, 36, 37, 38, 39, 83,
  41, 42, 43, 44, 45, 46, 47, 48, 49, 84,
  51, 52, 53, 54, 55, 56, 57, 58, 59, 85,
  61, 62, 63, 64, 65, 66, 67, 68, 69, 86,
  71, 72, 73, 74, 75, 76, 77, 78, 79, 87,
  81, 82, 83, 84, 85, 86, 87, 80
];

var visited = new Array(data.length);

// Pass 1: find one item which must be in cycle, max runtime O(n)
var j = 0;
do {
  visited[j] = false;
  j = data[j];
} while (visited[j] == null);
// Pass 2: follow cycle, max runtime O(n)
var cnt = 1;
for (var i = data[j]; i != j; i = data[i]) {
  cnt ++;
  // you could store the cycle nodes here to identify the non-cycle ones
}

//the visited array now contains information about nodes belonging to loop
document.write("Count: "+cnt);

Edit (again): Implemented support for multiple cycles; will output all cycle lengths and the nodes of the longest cycle. Complexity still O(n). :)

var data=[5,4,4,1,3,0];

var visited = new Array(data.length);

var chains = [];

// Pass 1: find all chains - complexity O(n)
for (var i = 0; i < data.length; i++) {
  if (!visited[i]) {
    var current = {
      chain: chains.length
    }
    var j = i;
    var len = 0;
    do {
      len++;
      visited[j] = current;
      j = data[j];
    } while (!visited[j]);
    if (visited[j].chain === current.chain) {
      chains.push(j); // this index is part of a cycle
    } else {
      current.chain = visited[j].chain;
    }
  }
}

// Pass 2: count elements, max complexity O(n) because no node is visited twice and only nodes which are part of a cycle will be visited
var lengths = new Array(chains.length);
var best = [];
for (var i = 0; i < chains.length; i++) {
  var curr = [];
  var j = chains[i];
  do {
    curr.push(j);
    j = data[j];
  } while (j != chains[i]);
  lengths[i] = curr.length;
  if (curr.length > best.length) {
    best = curr;
  }
}

// Output result:
document.write("<p>Identified cycle lengths: "+lengths.join(", ")+"</p>");
document.write("<p>IDs of longest cycle: "+best.join(", ")+"</p>");
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Lucero
  • 59,176
  • 9
  • 122
  • 152
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/130513/discussion-between-saeed-amiri-and-lucero). – Saeed Amiri Dec 13 '16 at 13:20
  • @Lucero Works fine, couldn't break it. Now, "If there is always exactly one cycle in the data structure": if the graph is totally connected, then there *will* be one cycle (because there's no sink for the water to flow out, a node need to always be connected to another node) and the cycle is unique (the pipes can join in a mixer, but no outgoing manifolds are allowed). Speaking of which: are you willing to step up the algo for multi-parted graphs and detect the loop with max length? (it is not what the original question asked). If positive, here's an example `var data=[5,4,4,1,3,0];` 1,3,4. – Adrian Colomitchi Dec 13 '16 at 15:23
  • @AdrianColomitchi Sorry, don't have the time right now, but that should not be very difficult based on the 2nd sample... – Lucero Dec 13 '16 at 15:26
  • @Lucero Thanks a lot. The Q is more of an... umm... "academic interest" for me (rather than a practical problem), so no obligation to go further if you don't want to and no hurry if you want. (I should go to sleep now, wee hours in the morning here). Thanks again. – Adrian Colomitchi Dec 13 '16 at 15:34
  • @AdrianColomitchi Added the multi-parted graph sample. Still O(n). Good night ;) – Lucero Dec 13 '16 at 16:01
  • I've rolled this back one last time to the author's version. Please don't engage in an edit war over style and what context the author believes is needed. You can always *vote* if you feel something is not helpful. – Martijn Pieters Dec 19 '16 at 11:50
2

Here is a code for

  1. Eliminating vertices which they do not belong to any cycle,

  2. Outputing all Cycles

  3. Outputing largest Cycle and its length

The code is self descriptive and also has lots of comments and one can simply understand the proper input format (which is not like the one OP provided).

http://ideone.com/boY5U2

BTW, maybe ideone destroy that code sooner than SO, so I include it here as well.

using System;
using System.Collections.Generic;
using System.Linq;

public class Node
{
    public int id;
    public Node next;
    public int indegree = 0;
    public bool visited = false;
}
public class Test
{
    public static void Main()
    {

        // read number of nodes of the graph
        int n = Convert.ToInt32(Console.ReadLine());

        Node []nodes = new Node[n];

        // initiate nodes
        for (int i=0;i<n;i++)
            nodes[i]=new Node{id = i};

        // reading input and initializing indegrees
        for (int i=0;i<n;i++)
        {
            // Each vertex has outgoing edge, so always there is a "next" neighbour
            var next = Convert.ToInt32(Console.ReadLine());

            nodes[i].next = nodes[next];    
            nodes[next].indegree++;
        }

        // Removing vertices of indegree zero recursively.
        // The code runs in O(n), in fact, it visits each edge only once, so it 
        // is O(|E|), but as every vertex has out degree one, |E|=O(|V|).
        for (int i=0;i<n;i++)
        {
                var current = nodes[i];

                while (current != null && current.indegree == 0)
                {
                    current.next.indegree--;
                    var tmp = current.next;
                    nodes[current.id] = null;
                    current = tmp;
                }
        }

        // Find All Cycles and Write Them in Console
        var cycles = new List<List<int>>();
        var count = 0;
        for (int i=0;i<n;i++)
        {
            if (nodes[i]!=null&& !nodes[i].visited)
            {
                cycles.Add(new List<int>());
                var current = nodes[i];

                while (!current.visited)
                {
                    cycles[count].Add(current.id);
                    nodes[current.id].visited = true;
                    current = current.next;
                }
                count++;
            }
        }

        // Print cycles and find largest cycle and its length
        if (cycles.Count > 0)
        {
            Console.WriteLine("All cycles:");
            for (int i=0;i<cycles.Count;i++)
                Console.WriteLine(String.Join(", ", cycles[i]));

            int maxLength = cycles.Max(x=>x.Count);
            Console.WriteLine("Maximum cycle length is: " + maxLength);

             Console.WriteLine("Largest cycle is: " +
                                String.Join(", ",cycles.FirstOrDefault( x=>x.Count==maxLength)));
        }
        else
            Console.WriteLine("There is no cycle in the graph");
    }
}
Saeed Amiri
  • 22,252
  • 5
  • 45
  • 83
2

If a tentacle pops from every cycle node, then we just need to find which nodes have more than two links to find the nodes of the cycle. This works also for "non-totally connected graphs".

Most of the code is dedicated to generate the graph with graphviz. I didn't check yet, but apparently this library offers a real solution for finding cycles in networks.

enter image description here

# Create an octopus flavoured node map, in the format:
# {0: [1], 1: [2], 2: [3], 3: [4], 4: [5], 5: [6], 6: [7], 7: [8], 8: [72], 9: [10], 10: [11], 11: [12], 12: [13], 13: [14], 14: [15], 15: [16], 16: [17], 17: [73], 18: [19], 19: [20], 20: [21], 21: [22], 22: [23], 23: [24], 24: [25], 25: [26], 26: [74], 27: [28], 28: [29], 29: [30], 30: [31], 31: [32], 32: [33], 33: [34], 34: [35], 35: [75], 36: [37], 37: [38], 38: [39], 39: [40], 40: [41], 41: [42], 42: [43], 43: [44], 44: [76], 45: [46], 46: [47], 47: [48], 48: [49], 49: [50], 50: [51], 51: [52], 52: [53], 53: [77], 54: [55], 55: [56], 56: [57], 57: [58], 58: [59], 59: [60], 60: [61], 61: [62], 62: [78], 63: [64], 64: [65], 65: [66], 66: [67], 67: [68], 68: [69], 69: [70], 70: [71], 71: [79], 72: [73], 73: [74], 74: [75], 75: [76], 76: [77], 77: [78], 78: [79], 79: [72]}
class Octopus():
    def create(self, octopus_id, complete=True):
        cycle_len = 8
        tentacle_len = 9
        node_n = 0
        output = {}
        first_node = octopus_id * ((tentacle_len * cycle_len) + cycle_len)

        for i in range(cycle_len):
            for j in range(tentacle_len - 1):
                tn = (i*tentacle_len) + j
                output[first_node+tn] = [first_node+tn+1]

        for k in range(cycle_len):
            cni = (k * tentacle_len) + tentacle_len - 1
            cnf = (tentacle_len * cycle_len) + k
            if cni not in output:
                output[first_node+cni] = []
            if cnf not in output:
                output[first_node+cnf] = []

            output[first_node+cni] += [first_node+cnf]
            if cnf + 1 >= (tentacle_len * cycle_len) + cycle_len:
                if complete:
                    output[first_node+cnf] += [first_node+(tentacle_len * cycle_len)]
            else:
                output[first_node+cnf] += [first_node+cnf+1]

        return output


# The tool that performs the node triage
class CycleFinder():

    nodes = {}

    # Distinguish nodes from different octopuses
    def find_groups(self, newcomers, cycles=[]):
        cycle_groups = []
        new_cycle = [newcomers]
        for cycle in cycles:
            linked_cycle = False
            for newcomer in newcomers:
                if newcomer in cycle:
                    linked_cycle = True
                    new_cycle += [cycle]
                    new_cycle = [list(set([item for sublist in new_cycle for item in sublist]))]
            if not linked_cycle:
                cycle_groups += [list(set(cycle))]
        cycle_groups += new_cycle

        return [list(i) for i in set(tuple(i) for i in cycle_groups)]

    # Given a node number, return the cycle id to which it belongs
    def get_cycle_id(self, node, cycles):
        for i,cycle in enumerate(cycles):
            if node in cycle:
                return i

    # Find nodes with more than two links
    def get_cycles(self):
        serialized_data = {}
        for node,link in self.nodes.items():
            if link:
                link = link[0]
                if node not in serialized_data:
                    serialized_data[node] = []
                if link not in serialized_data:
                    serialized_data[link] = []

                serialized_data[node] += [link]
                serialized_data[link] += [node]
        cycle_nodes = set()
        for node, links in serialized_data.items():
            if len(links) > 2:
                cycle_nodes.add(node)

        cycles=[]
        for node, links in serialized_data.items():
            cycles = self.find_groups([node] + links, cycles)

        grouped_cycle_nodes = {}
        for node in cycle_nodes:
            group_id = self.get_cycle_id(node, cycles)
            if group_id not in grouped_cycle_nodes:
                grouped_cycle_nodes[group_id] = []
            grouped_cycle_nodes[group_id] += [node]
        return grouped_cycle_nodes.values()


    # *** Generate graph ***

    def generate_colors(self, total_colors):
        colors = []
        for color in range(total_colors):
            h = float(color) / total_colors 
            rgb = tuple(int(i * 255) for i in self.colorsys.hsv_to_rgb(h,1,1))
            colors += ['#%02x%02x%02x' % (int(rgb[0]), int(rgb[1]), int(rgb[2]))]
        return colors

    # Draw pngs
    def output_images(self, image_id, cycles):

        A=self.pgv.AGraph()
        A.edge_attr['style']='bold'
        A.edge_attr['len']='2'
        A.node_attr['style']='filled, bold'
        A.node_attr['shape']='circle'
        A.node_attr['width']='.6'
        A.node_attr['height']='.6'
        A.node_attr['fixedsize']='true'
        A.node_attr['fontsize']='10'
        A.node_attr['fontname']='Helvetica-Bold'

        for node, links in self.nodes.items():
            for link in links:
                A.add_edge(node,link)

        # Get the color gradient in hex
        total_colors = 3
        colors = self.generate_colors(total_colors)

        for color_i, group in enumerate(cycles):
            _color = colors[color_i]
            for node in group:
                A.get_node(node).attr['fillcolor']= _color

        A.write('{folder_name}/tmp.dot'.format(folder_name=self.settings['image_folder_name'])) # write to simple.dot

        B=self.pgv.AGraph('{folder_name}/tmp.dot'.format(folder_name=self.settings['image_folder_name'])) # create a new graph from file
        B.layout('sfdp') # layout with default (neato)
        B.draw('{folder_name}/{file_name_prefix}_{file_n}.png'.format(
                folder_name=self.settings['image_folder_name'], 
                file_n = image_id,
                file_name_prefix = self.settings['image_file_name_prefix'],
            )
        ) 

    # Create output folder for the graph image
    def prepare_output_folder(self, folder_name, file_name_prefix):

        if not self.os.path.exists(folder_name):
            self.os.makedirs(folder_name)
        else:
            if self.settings['delete_previously_generated_images']:
                # Careful, this will clean up old png files with our file prefix 
                # from the output directory we created.
                files = self.glob.glob(folder_name + '/' + file_name_prefix + '*.png')

                for f in files:
                    self.os.remove(f)

    # Initialize our puzzle solver
    def __init__(self, **settings):

        # save settings in a property
        self.settings = {}
        for key, value in settings.iteritems(): 
            self.settings[key] = value

        self.nodes_per_group = 3

        # init image creation handling, if specified in the settings
        if 'create_images' in self.settings and self.settings['create_images']:

            # import all modules related to generating the images
            self.os = __import__('os')
            self.glob = __import__('glob')
            self.random = __import__('random')
            self.pgv = __import__('pygraphviz')
            self.colorsys = __import__('colorsys')

            self.prepare_output_folder(self.settings['image_folder_name'], self.settings['image_file_name_prefix'])

        self.nodes = self.settings['nodes']

    def close(self):
        self.log_status()
        print 'Finished.'



def main():

    # Create a node network, to start
    octopuses = Octopus().create(0, complete=False) # create an octopus with a non closed cycle

    for i in range(1,3):
        octopuses.update(Octopus().create(i)) # create 2 other octopuses

    # Create our puzzle solver

    ng = CycleFinder(

        # Create images
        create_images = True,
        # Default path where graph images will be created
        image_folder_name = 'graphs',
        # Filename prefix. Images will be postfixed with a number, eg: combination_0.png
        image_file_name_prefix = 'combination',
        # Clean up images folder before creating new batch
        delete_previously_generated_images = True,

        nodes = octopuses,
    )

    # Discriminate nodes that are part of the cycle
    cycles = ng.get_cycles()

    # Create an output graph
    ng.output_images(1, cycles)



if __name__ == "__main__":

    main()
Ivan Chaer
  • 6,980
  • 1
  • 38
  • 48
0

If you only want the count of the circle, then it is possible in O(1) space and O(n) time by abusing the adjacency vector:

function getCycleCount(adjacency) {
  let i = 0, count = 0;
  while(adjacency[i] >= 0) {
    let j = adjacency[i];
    adjacency[i] = -adjacency[i] - 1;
    i = j;
  }
  while(adjacency[i] < 0) {
    adjacency[i] = -adjacency[i] - 1;
    i = adjacency[i];
    count++;
  }
  for (i = 0; adjacency[i] < 0; i = adjacency[i])
    adjacency[i] = -adjacency[i] - 1;
  return count;
}

https://jsfiddle.net/d2gzdwej/

maraca
  • 8,468
  • 3
  • 23
  • 45
  • Well, despite calling it an abuse, modifying the vector (btw. I'd suggest to use the `~` operator) is actually O(n) space usage for the algorithm to work... – Lucero Dec 16 '16 at 09:36
  • @Lucero not really. Look at all those interview questions requesting O(1) space and O(n) time where the trick is to revert a linked list and do some calculations then revert it back again. We are talking about the extra space here... I'm never allocating any array. – maraca Dec 16 '16 at 16:06
  • Never had such an interview question. Anyways, fact is, the algorithm *does need* n bits of own information to work; whether this actually requires *extra* memory is another question IMHO. – Lucero Dec 16 '16 at 16:30