Find Circular Items in a Collection (of non-Transitive Items)

Question

Problem:

I have got a simple List<T> and I'm trying to sort it. But items in the list are not all transitive in terms of comparability, i.e., for e.g. my List<T> looks like:

A
B
C
D
E

where A > B and B > C but C > A. It is also possible to have circular greatness like A > B, B > C, C > D but D > A, i.e., it need not be always a group of 3. What I want is to find all groups of circular greatnesses in the given List<T>. For e.g., assuming A > B > C > A and A > B > C > D > A are the two circular groups in the above case my output should look either like:

List<List<T>> circulars = [[A, B, C, A], [A, B, C, D, A]]

or

List<List<T>> circulars = [[A, B, C], [A, B, C, D]]
// but in this case I do not want duplicates in the output. 
// For e.g., the output shouldn't have both [B, C, A] and [A, B, C]
// since both the groups refer to the same set of circular items A, B & C
// as B > C > A > B is also true. 
// But [B, A, C] is a different group (though nothing circular about it)

Either one is fine with me. I prefer small (linquish) solution but this didn't look as easy as it first seemed. May be I'm missing something very simple.

Scenario:

This is a part of sports analysis where one player/team will be stronger than the other which in turn will be stronger than another but the last one will be stronger than the first. I cant reveal more information, but let me take a case of head-to-heads in sports, especially in tennis and chess the individual match-ups lead to this kind of situation. For e.g., in terms of head-to-head, Kramnik leads Kasparov and Kasparov leads Karpov but Karpov leads Kramnik. Or for another e.g., Federer leads Davydenko, Davydenko leads Nadal but Nadal leads Federer.

My class looks like this:

class Player : IComparable<Player>
{
    // logic
}

This is what I tried:

First generate all possible permutations of collection items with a minimum group size of 3. Like [A B C], [A, C, B]...., [A, B, C, D], [A, B, D, C].... etc (This is very slow)
Then go through the entire sub groups and check for patterns. Like if there are any situations where A > B > C > D (This is reasonably slow, but I'm ok with it)
Lastly go through the entire sub groups to remove the duplicate groups like [A, B, C] and [B, C, A] etc.

Code:

var players = [.....]; //all the players in the collection

// first generate all the permutations possible in the list from size 3 
// to players.Count
var circulars = Enumerable.Range(3, players.Count - 3 + 1)
               .Select(x => players.Permutations(x))
               .SelectMany(x => x)
               .Select(x => x.ToList())

// then check in the each sublists if a pattern like A > B > C > A is 
// generated                                                                          vv    this is the player comparison
               .Where(l => l.Zip(l.Skip(1), (p1, p2) => new { p1, p2 }).All(x => x.p1 > x.p2) && l.First() < l.Last())

// then remove the duplicate lists using special comparer
               .Distinct(new CircularComparer<Player>())
               .ToList();
  
public static IEnumerable<IEnumerable<T>> Permutations<T>(this IEnumerable<T> list, int length)
{
    if (length == 1) 
        return list.Select(t => new[] { t });

    return Permutations(list, length - 1)  
          .SelectMany(t => list.Where(e => !t.Contains(e)), (t1, t2) => t1.Concat(new[] { t2 }));
}

class CircularComparer<T> : IEqualityComparer<ICollection<T>>
{
    public bool Equals(ICollection<T> x, ICollection<T> y)
    {
        if (x.Count != y.Count)
            return false;

        return Enumerable.Range(1, x.Count)
              .Any(i => x.SequenceEqual(y.Skip(i).Concat(y.Take(i))));
    }

    public int GetHashCode(ICollection<T> obj)
    {
        return 0;
    }
}

The problem with this approach is that it is extremely slow. For a collection of just around 10 items, the permutations that has to generated itself is huge (close to 1 million items). Is there a better approach which is reasonably efficient? I am not after the fastest code possible. Is there a better recursive approach here? Smells like it.

score 2 · Answer 1 · edited May 23 '17 at 12:14

The scenario...

[A, B, C, D, E]

where A > B, B > C, C > D, C > A, D > A

...could be represented as a directed graph using the convention that A -> B means A > B:

Graph

So the question is essentially "How can I find cycles in a directed graph?"

To solve that, you can use Tarjan's strongly connected components algorithm. I would recommend looking up an good implementation of this algorithm and apply it to your scenario.

score 1 · Answer 2 · answered Jun 26 '15 at 03:37

There are numerous means for enumerating the permutations of N objects such that each permutation can be efficiently obtained from its index in the enumeration. One such as this excerpt from my tutorial on CUDOFY using the Travelling Salesman problem:

    /// <summary>Amended algorithm after SpaceRat (see Remarks): 
    /// Don't <b>Divide</b> when you can <b>Multiply</b>!</summary>
    /// <seealso cref="http://www.daniweb.com/software-development/cpp/code/274075/all-permutations-non-recursive"/> 
    /// <remarks>Final loop iteration unneeded, as element [0] only swaps with itself.</remarks>
  [Cudafy]
  public static float PathFromRoutePermutation(GThread thread, 
            long  permutation, int[,] path) {
     for (int city = 0; city < _cities; city++) { path[city, thread.threadIdx.x] = city; }

     var divisor = 1L;
     for (int city = _cities; city > 1L; /* decrement in loop body */) {
        var dest    = (int)((permutation / divisor) % city);
        divisor     *= city;

        city--;

        var swap                        = path[dest, thread.threadIdx.x];
        path[dest, thread.threadIdx.x]  = path[city, thread.threadIdx.x];
        path[city, thread.threadIdx.x]  = swap;
     }
     return 0;
    }
    #endregion
}

From this point one is able to readily perform the identification of permutations with circular greatness in parallel. One can first use the multiple cores on the CPU to achieve improved performance, and then those available on the GPU. After repeated tunings of the Travelling Salesman problem in this way I improved performance for the 11 cities case from over 14 seconds (using CPU only) to about .25 seconds using my GPU; an improvement of 50 times.

Of course, your mileage will vary according to other aspects of the problem as well as your hardware.

I have a couple of questions: what is `_cities` here, where do you get it from? What is the float you are returning? — nawfal, Jun 26 '15 at 11:28

nawfal · Accepted Answer · 2015-06-30T08:24:42.250

I could improve performance drastically relying on recursion. Rather than generate the entire permutations of sequences possible beforehand, I now recurse through the collection to find cycles. To help with that I created circular references of itself (greater and lesser items) so that I can traverse through. Code is a bit longer.

Here is the basic idea:

I created a basic interface ICyclic<T> which has to be implemented by the Player class.
I traverse through the collection and assign the lesser and greater items (in the Prepare method).
I ignore the really bad ones (ie for which there are no lesser items in the collection) and really good ones (ie for which there are no greater items in the collection) to avoid infinite recursion and generally improve performance. The absolute best ones and worst ones don't contribute to cycles. All done in Prepare method.
Now each item will have a collection of items lesser than the item. And the items in the collection will have their own collection of worse items. And so on. This is the path I recursively traverse.
At every point the last item is compared to the first item in the visited path to detect cycles.
The cycles are added to a HashSet<T> to avoid duplicates. An equality comparer is defined to detect equivalent circular lists.

Code:

public interface ICyclic<T> : IComparable<T>
{
    ISet<T> Worse { get; set; }
    ISet<T> Better { get; set; }
}

public static ISet<IList<T>> Cycles<T>(this ISet<T> input) where T : ICyclic<T>
{
    input = input.ToHashSet();
    Prepare(input);

    var output = new HashSet<IList<T>>(new CircleEqualityComparer<T>());
    foreach (var item in input)
    {
        bool detected;
        Visit(item, new List<T> { item }, item.Worse, output, out detected);
    }

    return output;
}

static void Prepare<T>(ISet<T> input) where T : ICyclic<T>
{
    foreach (var item in input)
    {
        item.Worse = input.Where(t => t.CompareTo(item) < 0).ToHashSet();
        item.Better = input.Where(t => t.CompareTo(item) > 0).ToHashSet();
    }

    Action<Func<T, ISet<T>>> exceptionsRemover = x =>
    {
        var exceptions = new HashSet<T>();
        foreach (var item in input.OrderBy(t => x(t).Count))
        {
            x(item).ExceptWith(exceptions);
            if (!x(item).Any())
                exceptions.Add(item);
        }

        input.ExceptWith(exceptions);
    };
    exceptionsRemover(t => t.Worse);
    exceptionsRemover(t => t.Better);
}

static void Visit<T>(T item, List<T> visited, ISet<T> worse, ISet<IList<T>> output, 
                     out bool detected) where T : ICyclic<T>
{
    detected = false;

    foreach (var bad in worse)
    {
        Func<T, T, bool> comparer = (t1, t2) => t1.CompareTo(t2) > 0;

        if (comparer(visited.Last(), visited.First()))
        {
            detected = true;
            var cycle = visited.ToList();
            output.Add(cycle);
        }

        if (visited.Contains(bad))
        {
            var cycle = visited.SkipWhile(x => !x.Equals(bad)).ToList();
            if (cycle.Count >= 3)
            {
                detected = true;
                output.Add(cycle);
            }
            continue;
        }

        if (bad.Equals(item) || comparer(bad, visited.Last()))
            continue;

        visited.Add(bad);

        Visit(item, visited, bad.Worse, output, out detected);
        if (detected)
            visited.Remove(bad);
    }
}

public static HashSet<T> ToHashSet<T>(this IEnumerable<T> source)
{
    return new HashSet<T>(source);
}

public class CircleEqualityComparer<T> : IEqualityComparer<ICollection<T>>
{
    public bool Equals(ICollection<T> x, ICollection<T> y)
    {
        if (x.Count != y.Count)
            return false;

        return Enumerable.Range(1, x.Count)
              .Any(i => x.SequenceEqual(y.Skip(i).Concat(y.Take(i))));
    }

    public int GetHashCode(ICollection<T> obj)
    {
        return unchecked(obj.Aggregate(0, (x, y) => x + y.GetHashCode()));
    }
}

Original answer (from OP)

On the plus side, this is shorter and concise. Also since it doesn't rely on recursion, it need not have an ICyclic<T> constraint, any IComparable<T> should work. On the minus side, it is slow as molasses in January.

public static IEnumerable<ICollection<T>> Cycles<T>(this ISet<T> input) where T : IComparable<T>
{
    if (input.Count < 3)
        return Enumerable.Empty<ICollection<T>>();

    Func<T, T, bool> comparer = (t1, t2) => t1.CompareTo(t2) > 0;

    return Enumerable.Range(3, input.Count - 3 + 1)
          .Select(x => input.Permutations(x))
          .SelectMany(x => x)
          .Select(x => x.ToList())
          .Where(l => l.Zip(l.Skip(1), (t1, t2) => new { t1, t2 }).All(x => comparer(x.t1, x.t2))
                   && comparer(l.Last(), l.First()))
          .Distinct(new CircleEqualityComparer<T>());
}

public static IEnumerable<IEnumerable<T>> Permutations<T>(this IEnumerable<T> list, int length)
{
    if (length == 1)
        return list.Select(t => new[] { t });

    return Permutations(list, length - 1)
          .SelectMany(t => list.Where(e => !t.Contains(e)), (t1, t2) => t1.Concat(new[] { t2 }));
}

public class CircleEqualityComparer<T> : IEqualityComparer<ICollection<T>>
{
    public bool Equals(ICollection<T> x, ICollection<T> y)
    {
        if (x.Count != y.Count)
            return false;

        return Enumerable.Range(1, x.Count)
              .Any(i => x.SequenceEqual(y.Skip(i).Concat(y.Take(i))));
    }

    public int GetHashCode(ICollection<T> obj)
    {
        return unchecked(obj.Aggregate(0, (x, y) => x + y.GetHashCode()));
    }
}

Few things to note:

I have used a ISet<T>s and HashSet<T>s instead of more traditional List<T>s, but it is just to make the intent more clear, that no duplicate items are allowed. Lists should work just fine.
.NET doesn't really have insertion order preserving set (ie which allows no duplicates), hence had to use List<T> at many places. A set might have marginally improved performance, but more importantly using set and list interchangeably causes confusion.
The first approach gives performance jump of the order of 100 times over the second one.
The second approach can be sped up by utilizing the Prepare method. The logic holds there too, ie, lesser members in collection means lesser permutations to generate. But still very very painfully slow.
I have made the methods generic but the solution can be made more general purpose. For eg, in my case the cycle is detected based on a certain comparison logic. This can be passed as a parameter, ie, the items in the collection need not be just comparable, it could be any vertex determining logic. But that's left to readers' exercise.
In my code (both the examples) only cycles of minimum size 3 are considered, ie, cycles like A > B > C > A. It doesn't account for cycles like A > B, B > A situations. In case you need it, change all the instances of 3 in the code to whatever you like. Even better pass it to the function.

Find Circular Items in a Collection (of non-Transitive Items)

Problem:

Scenario:

3 Answers3