6

Given the following list of redirects

[
    {
        "old": "a",
        "target": "b"
    },
    {
        "old": "b",
        "target": "c"
    },
    {
        "old": "c",
        "target": "d"
    },
    {
        "old": "d",
        "target": "a"
    },
    {
        "old": "o",
        "target": "n"
    },
    {
        "old": "n",
        "target": "b"
    },
    {
        "old": "j",
        "target": "x"
    },
    {
        "old": "whatever",
        "target": "something"
    }
]

Here we can see that the first item "a" should redirect to "b". If we follow the list we can see the following pattern:

a -> b
b -> c
c -> d
d -> a

So we would end up with a circular reference since "a" would end up pointing to "d" and "d" points to "a".

What would be the most efficient way of finding the circular references?

I've come up with the following algorithm in C#

var items = JsonConvert.DeserializeObject<IEnumerable<Item>>(json)
    .GroupBy(x => x.Old)
    .Select(x => x.First())
    .ToDictionary(x => x.Old, x => x.Target);
var circulars = new Dictionary<string, string>();
foreach (var item in items)
{
    var target = item.Value;
    while (items.ContainsKey(target))
    {
        target = items[target];

        if (target.Equals(item.Key))
        {
            circulars.Add(target, item.Value);
            break;
        }
    }
}

This will give me a list containing 4 items looking like this:

[
    {
        "old": "a",
        "target": "b"
    },
    {
        "old": "b",
        "target": "c"
    },
    {
        "old": "c",
        "target": "d"
    },
    {
        "old": "d",
        "target": "a"
    }
]

But im only interested in telling the user something like

"Hey you can't do that, It will be a circular reference because of "a" points to "b" which points to "c" which points to "d" which points to "a"

So, do you guys have any suggestions? Im sure that some other(better) algorithms exists for doing this... :)

JOSEFtw
  • 9,781
  • 9
  • 49
  • 67
  • 6
    Try a *cycle detection algorithm* in *graph theory*. – Willem Van Onsem Sep 29 '17 at 08:27
  • Will look it up right away, thanks! :) – JOSEFtw Sep 29 '17 at 08:28
  • 1
    Or is it "Hey you can't do that, B and C is a circular reference because of A and D"? - You can only report a cycle - it has no start and end - it's a cycle. – MineR Sep 29 '17 at 08:29
  • Hmm, good question. The goal is to inform the user that they would end up in a redirection loop because of B, C and D(I guess?) – JOSEFtw Sep 29 '17 at 08:31
  • In our software, we say "Cycle detected, including nodes: A,B,C,D". – MineR Sep 29 '17 at 08:35
  • @MineR yeah that makes sense, I will do the same, thank you for your input. – JOSEFtw Sep 29 '17 at 08:36
  • You can improve your alghoritm avoiding to test items you have already found as a part of a circular. Also, you should store a circular in a different type of collection, i.e a List of all nodes included in the circular. And note that you can find more than a circular, so you will end having something like a List> (A list of circular, where each "circular" is a List of the nodes included in that circular). – Gian Paolo Sep 29 '17 at 10:21
  • Is each `old` and `target` unique by itself (no two entries with the same `old` value and no two entries with the same `target` value)? Otherwise your algorithm would probably fail anyway. – grek40 Sep 29 '17 at 10:36
  • Old is unique, target is not. – JOSEFtw Sep 29 '17 at 10:37

1 Answers1

4

While the generic graph-cycle-finding algorithms will work, your case is a bit special due to the "Old is unique, target is not" constraint. It effectively means, that each node can only have a single successor and thus it can at most be part of one cycle. Also, when DFS-Traversing the nodes, there won't be any fork, so an iterative DFS implementation becomes very easy.

Given an arbitrary starting node, this function can find a cycle that is reachable from the starting node:

/// <summary>
/// Returns a node that is part of a cycle or null if no cycle is found
/// </summary>
static string FindCycleHelper(string start, Dictionary<string, string> successors, HashSet<string> stackVisited)
{
    string current = start;
    while (current != null)
    {
        if (stackVisited.Contains(current))
        {
            // this node is part of a cycle
            return current;
        }

        stackVisited.Add(current);

        successors.TryGetValue(current, out current);
    }

    return null;
}

In order to maintain efficiency, it can be extended to early-return when an already checked node is reached (using previouslyVisited):

/// <summary>
/// Returns a node that is part of a cycle or null if no cycle is found
/// </summary>
static string FindCycleHelper(string start, Dictionary<string, string> successors, HashSet<string> stackVisited, HashSet<string> previouslyVisited)
{
    string current = start;
    while (current != null)
    {
        if (previouslyVisited.Contains(current))
        {
            return null;
        }
        if (stackVisited.Contains(current))
        {
            // this node is part of a cycle
            return current;
        }

        stackVisited.Add(current);

        successors.TryGetValue(current, out current);
    }

    return null;
}

The following function is used to maintain consistency of the visited sets

static string FindCycle(string start, Dictionary<string, string> successors, HashSet<string> globalVisited)
{
    HashSet<string> stackVisited = new HashSet<string>();

    var result = FindCycleHelper(start, successors, stackVisited, globalVisited);

    // update collection of previously processed nodes
    globalVisited.UnionWith(stackVisited);

    return result;
}

It is called for each old node in order to check for cycles. When a cycle starting node is detected, the cycle information can be separately created:

// static testdata - can be obtained from JSON for real code
IEnumerable<Item> items = new Item[]
{
    new Item{ Old = "a", Target = "b" },
    new Item{ Old = "b", Target = "c" },
    new Item{ Old = "c", Target = "d" },
    new Item{ Old = "d", Target = "a" },
    new Item{ Old = "j", Target = "x" },
    new Item{ Old = "w", Target = "s" },
};
var successors = items.ToDictionary(x => x.Old, x => x.Target);

var visited = new HashSet<string>();

List<List<string>> cycles = new List<List<string>>();

foreach (var item in items)
{
    string cycleStart = FindCycle(item.Old, successors, visited);

    if (cycleStart != null)
    {
        // cycle found, get detail information about involved nodes
        List<string> cycle = GetCycleMembers(cycleStart, successors);
        cycles.Add(cycle);
    }
}

Output your found cycles any way you want. For example

foreach (var cycle in cycles)
{
    Console.WriteLine("Cycle:");
    Console.WriteLine(string.Join(" # ", cycle));
    Console.WriteLine();
}

The implementation of GetCycleMembers is pretty easy - it depends on a correct starting node:

/// <summary>
/// Returns the list of nodes that are involved in a cycle
/// </summary>
/// <param name="cycleStart">This is required to belong to a cycle, otherwise an exception will be thrown</param>
/// <param name="successors"></param>
/// <returns></returns>
private static List<string> GetCycleMembers(string cycleStart, Dictionary<string, string> successors)
{
    var visited = new HashSet<string>();
    var members = new List<string>();
    var current = cycleStart;
    while (!visited.Contains(current))
    {
        members.Add(current);
        visited.Add(current);
        current = successors[current];
    }
    return members;
}
grek40
  • 13,113
  • 1
  • 24
  • 50
  • NICE! Thank you very much, i've modified my code and I tested with your code as well, I get the same output, really nice. Anyhow, it seems like I found a bug. I've updated the JSON in my example above, I've added two more items with key "o" and key "n". "n" points to "b" and im thinking that the algorithm should be able to see that b points to c and so on, do you understand what I mean? – JOSEFtw Sep 29 '17 at 12:19
  • @JOSEFtw it's the same cycle, no matter whether you start at `a` or `b`. My implementation is designed to find each cycle, not each starting point that ends in a cycle somewhere on its way. If you want a different result, you need a different implementation. – grek40 Sep 29 '17 at 12:48
  • Okey, so key "n" should not show up in the cycle, is that correct? – JOSEFtw Sep 29 '17 at 12:53
  • 1
    @JOSEFtw yes, that's my unterstanding, because you wouldn't reach `n` again after entering the cycle (it's not part of the cycle) – grek40 Sep 29 '17 at 13:01