2

Say I have a List<List<Integer>> that contains Lists of numbers from 1 to n. What is a good way of removing lists with the same members but in different indexes?

Like if I have [[1,2,3], [2,1,3], [4,5,6]], I am considering the first and the second member as duplicates and I want to remove one of them (doesn't matter which one) to get [[2,1,3], [4,5,6]] or [[1,2,3], [4,5,6]].

There is one O(n^2)solution by looping through all the members and using list.contains(x) or even using a List<Set<Integer>> but I wonder if there is a better solution to do this.

Yar
  • 7,020
  • 11
  • 49
  • 69
  • Do inner lists contain fixed number of elements? in your example they have the same number of elements, which equals to 3 – Alex Aparin Nov 29 '16 at 07:04
  • @АлександрЛысенко we can assume they have fixed numbers of elements – Yar Nov 29 '16 at 07:04
  • Is it possible to sort inner lists and then outer list? – MBo Nov 29 '16 at 07:15
  • @MBo well that works, but still not more efficient. – Yar Nov 29 '16 at 07:17
  • @Yar actually sorting is more efficient. Bruteforce would require `O(n ^ 2)` for one pair of lists. There are `n(n - 1)` pairs of elements in the list. Even with a few logical optimizations you won't get below `O(n ^ 4)`. On the other hand sorting all lists requires `O(n^2 log n)` and the identity-check can now be done in `O(n)` per pair of lists. So we've got `O(n^2 log n)` + `O(n^2 * n)`, which is `O(n^3)`. I'd say that's quite a bit better –  Nov 29 '16 at 08:25

2 Answers2

3

One way of doing this would be to hash each list, then check more carefully lists with the same hash. There are a number of ways of doing this:

  1. If you build a hash from the xor of the elements of the list, then the hash is weak, but cheap to build, as it is independent from the order of elements of a list. If there are n lists, and k items per list, then building the hashes is just Θ(n k), which is very cheap. Of course, lists with identical hashes need to be compared, and the weak hash of this method might cause more collisions than necessary.

  2. If you sort each list, then build a hash from the sorted result, the hash will be stronger, but building the hashes will take Θ(n k log(k)).

The method that works better depends on the settings.

Ami Tavory
  • 74,578
  • 11
  • 141
  • 185
3

Algorithm in nutshell:

  1. Project each element of outer list into tuple of hash and index.
  2. Sort list of tuples with respect of its first element (hash)
  3. Extract indices from tuples with original hashes

Following code implements this algorithm

 using System;
 using System.Collections.Generic;
 using System.Diagnostics;
 using System.Linq;

 static class Program
{
//  Computes hash of array (we suppose, that any array has the fixed length)
//  In other words, we suppose, that all input arrays have the same length
static int array_hash(int[] array)
{
    int hc = array.Length;
    for (int i = 0; i < array.Length; ++i)
    {
        hc = unchecked(hc * 314159 + array[i]);
    }
    return hc;
}
static void Main(string[] args)
{
    var lists = new List<List<int>>();
    lists.Add(new List<int>() { 1, 2, 3 });
    lists.Add(new List<int>() { 3, 2, 1 });
    lists.Add(new List<int>() { 4, 5, 6 });

    var hashs = new List<Tuple<int, int>>(lists.Count);

    for (int i= 0; i < lists.Count; ++i)
    {
        var inner_list_copy = lists[i].ToArray();
        Array.Sort(inner_list_copy);
        hashs.Add(Tuple.Create(array_hash(inner_list_copy), i));
    }
    hashs.Sort((tuple1, tuple2) => tuple1.Item1.CompareTo(tuple2.Item1));
    var indices = new List<int>();
    var last_hash = 0;
    if (hashs.Count != 0)
    {
        last_hash = hashs[0].Item1;
        indices.Add(hashs[0].Item2);
    }
    for (int i = 1; i < hashs.Count; ++i)
    {
        var new_hash = hashs[i].Item1;
        if (new_hash != last_hash)
        {
            last_hash = new_hash;
            indices.Add(hashs[i].Item2);
        }
    }
    Console.WriteLine("Indices");
    for (int i = 0; i <  indices.Count; ++i)
    {
        Console.WriteLine(indices[i]);
    }

    Console.ReadLine();
}
}

Note: you can explore usage of other hash functions. See C# hashcode for array of ints

P.S. Just for fun - solution in haskell

-- f - removes duplicates from list of lists via sorting and grouping
f = (map head) . group . (map sort)
Community
  • 1
  • 1
Alex Aparin
  • 4,393
  • 5
  • 25
  • 51