2

I have a dictionary<int, List<string>> and I want to intersect all of the lists for each int.

How would I achieve that? I feel like this should be easy, but for some reason it's not working out.

Thanks.

Rob Allen
  • 2,871
  • 2
  • 30
  • 48
  • What is not working out? what is your current solution? – Yosi Dahari Nov 14 '13 at 17:54
  • What do you mean by "for each int", there should be unique integer values in your dictionary, since it is a key – Habib Nov 14 '13 at 17:54
  • Do you mean intersect all lists, ignoring the keys? – Yosi Dahari Nov 14 '13 at 17:56
  • well, a dictionary of > would have unique keys of type int, each int has a list of strings, I want only the strings that exist in all lists – Rob Allen Nov 14 '13 at 17:56
  • @Yosi, yes that is correct – Rob Allen Nov 14 '13 at 17:56
  • I think you should develop bubble sort of logic. Its like you need to apply foreach() for your dictionary. Then pick first List and do intersect operation with every other list. Put comman element in dictionary object. Then again compare that dictionary object with its own element. Mean white plz share your logic as well so that we can get better idea. – Bhaskar Singh Nov 14 '13 at 18:03

3 Answers3

6

It's easy enough to iterate the sequence of lists, putting the first into a HashSet and then intersecting each subsequence list with it:

public static IEnumerable<T> intersectAll<T>(IEnumerable<IEnumerable<T>> source)
{
    using (var iterator = source.GetEnumerator())
    {
        if (!iterator.MoveNext())
            return Enumerable.Empty<T>();

        var set = new HashSet<T>(iterator.Current);
        while (iterator.MoveNext())
            set.IntersectWith(iterator.Current);

        return set;
    }
}

Using this you can write IntersectAll(dictionary.Values.Cast<IEnumerable<string>>()) to get the intersection.

Servy
  • 202,030
  • 26
  • 332
  • 449
  • moreover, does this make a difference if I am forced to use .Net 3.5? My Dictionary> can't be inferred from the usage as indicated above. Trying to specify the type is a mess too – Rob Allen Nov 14 '13 at 19:40
  • 1
    @RobA You can call `Cast` to get `Values` into an `IEnumerable>` – Servy Nov 14 '13 at 19:45
  • @KingKing You only actually iterate the result of 1 of the 10,000 intersections. Mine doesn't defers execution, the other does. You aren't actually computing the intersection for 9,999 iterations of the `Aggregate` example. Also, in your example the intersection is *very* small. After the intersection of the first two sets it's two items. The primary difference between the two operations is that `Aggregate` is constantly taking that intermediate intersection and re-populating a new HashSet each time. Doing that for one or two items isn't expensive. Doing it for a lot is. – Servy Nov 14 '13 at 19:54
  • @KingKing The first issue is the most important. Just throw a `ToList` (or `Count`, or whatever) onto the end of each call of `IntersectAll` and `IntersectAggregate` and you'll see the differences become clear with that alone. Making the underlying collections larger, and have a more significant intersection, would only make the difference that much more pronounced. – Servy Nov 14 '13 at 19:59
  • @Servy well, just try enlarging the data and looks like this approach is faster than using `Aggregate`. I just try a random data with `1000` lists. Using `Aggregate` is slower when the data is large and the important thing is **it could throw StackOverflow exception**??? (when I use `10000` lists). – King King Nov 14 '13 at 20:05
  • @Servy you're right, appending `ToList()` will make they really different. I would like to confirm that using this approach is about **30 times** faster than using `Aggregate` (not mentioned about the chance of throwing `StackOverflowException` when the data is a little large for `Aggregate`). This is a great solution. Using `Aggregate` in this case looks like **useless** and should take care of the input size :) – King King Nov 14 '13 at 20:32
2

I think you're looking for something like the following;

List<string> TheIntersection = myDict.Select(x => x.Value).Aggregate((c, n) => c.Intersect(n)).ToList();
evanmcdonnal
  • 46,131
  • 16
  • 104
  • 115
  • 1
    Note that this method won't perform terribly well; you're constantly taking the intermediate sets and treating them as `IEnumerable` instances and repopulating a `HashSet` with them. That's a lot more work than putting them in one hashset and continually computing the intersection. – Servy Nov 14 '13 at 18:11
  • @servy, which method won't perform well? – Rob Allen Nov 14 '13 at 18:18
  • @RobA This one, because it's constantly re-populating an internal `HashSet` within `Intersect` that could, in theory, be re-used instead. It's a fairly noticeable extra overhead. – Servy Nov 14 '13 at 18:20
  • Also note that this method won't work if the dictionary is empty; it will throw an exception rather than returning an empty set. – Servy Nov 14 '13 at 18:37
1

I had a similar question as the OP a while back and ended up using Skeet's solution (which is similar to Servy's solution)

public List<T> IntersectAll<T>(IEnumerable<IEnumerable<T>> lists)
{
    HashSet<T> hashSet = null;
    foreach (var list in lists)
    {
        if (hashSet == null)
         hashSet = new HashSet<T>(list);
        else
         hashSet.IntersectWith(list);
    }
    return hashSet == null ? new List<T>() : hashSet.ToList();
}

Then you can get your intersected list by ...

var intersectedList = IntersectAll(myDictionary.Values);
Community
  • 1
  • 1
Jed
  • 10,649
  • 19
  • 81
  • 125