4

Given a list of lists (let's say 5 lists, to have a real number with which to work), I can find items that are common to all 5 lists with relative ease (see Intersection of multiple lists with IEnumerable.Intersect()) using a variation of the following code:

var list1 = new List<int>() { 1, 2, 3 };
var list2 = new List<int>() { 2, 3, 4 };
var list3 = new List<int>() { 3, 4, 5 };
var listOfLists = new List<List<int>>() { list1, list2, list3 };
var intersection = listOfLists.Aggregate((previousList, nextList) => previousList.Intersect(nextList).ToList());

Now let's say that intersection ends up containing 0 items. It's quite possible that there are some objects that are common to 4/5 lists. How would I go about finding them in the most efficient way?

I know I could just run through all the combinations of 4 lists and save all the results, but that method doesn't scale very well (this will eventually have to be done on approx. 40 lists).

If no item is common to 4 lists, then the search would be repeated looking for items common to 3/5 lists, etc. Visually, this could be represented by lists of grid points and we're searching for the points that have the most overlap.

Any ideas?

EDIT: Maybe it would be better to look at each point and keep track of how many times it appears in each list, then create a list of the points with the highest occurrence?

Community
  • 1
  • 1
Kyle G.
  • 870
  • 2
  • 10
  • 22

2 Answers2

6

You can select all numbers (points) from all lists, and group them by value. Then sort result by group size (i.e. lists count where point present) and select most common item:

var mostCommon = listOfLists.SelectMany(l => l)
                            .GroupBy(i => i)
                            .OrderByDescending(g => g.Count())
                            .Select(g => g.Key)
                            .First();
// outputs 3

Instead of taking only first item, you can take several top items by replacing First() with Take(N).


Returning items with number of lists (ordered by number of lists):

var mostCommonItems = from l in listOfLists
                      from i in l
                      group i by i into g
                      orderby g.Count() descending
                      select new {
                         Item = g.Key,
                         NumberOfLists = g.Count()
                      };

Usage (item is a strongly-typed anonymous object):

var topItem = mostCommonItems.First();
var item = topItem.Item;
var listsCount = topItem.NumberOfLists;

foreach(var item in mostCommonItems.Take(3))
    // iterate over top three items
Sergey Berezovskiy
  • 232,247
  • 41
  • 429
  • 459
  • Give me a few minutes to try this out and I'll get back to you :) – Kyle G. Jul 17 '13 at 15:17
  • This is Linq, correct? I don't have any experience with Linq so far, but I'll give it a whirl! – Kyle G. Jul 17 '13 at 15:19
  • @KyleG. yep, this is LINQ – Sergey Berezovskiy Jul 17 '13 at 15:20
  • One more thing: can the number of lists be returned as well? E.g. in the case where the most common items appear in 4/5 lists, can that "4" be returned? – Kyle G. Jul 17 '13 at 15:20
  • @KyleG. yes, hold on I'll add number of lists – Sergey Berezovskiy Jul 17 '13 at 15:21
  • "Instead of taking only first item, you can take several top items by replacing `First()` with `Take(N)`." What if `N` is unknown? I'm looking to get ALL the items that in appear in the most number of lists, so `N` could theoretically carry between 1 and the size of the smallest list. – Kyle G. Jul 17 '13 at 15:28
  • Thats up to you - you will have collection with information about each unique item in all lists. I.e. `var uniqueItemsCount = mostCommonItems.Count()`. You can select any most common items you want to see with `Take` operator – Sergey Berezovskiy Jul 17 '13 at 15:30
1

You can first combine all the lists, then find the Mode of the list using a dictionary strategy as follows. This makes it pretty fast:

/// <summary>
/// Gets the element that occurs most frequently in the collection.
/// </summary>
/// <param name="list"></param>
/// <returns>Returns the element that occurs most frequently in the collection.
/// If all elements occur an equal number of times, a random element in
/// the collection will be returned.</returns>
public static T Mode<T>(this IEnumerable<T> list) 
{
    // Initialize the return value
    T mode = default(T);

    // Test for a null reference and an empty list
    if (list != null && list.Count() > 0)
    {
        // Store the number of occurences for each element
        Dictionary<T, int> counts = new Dictionary<T, int>();

        // Add one to the count for the occurence of a character
        foreach (T element in list)
        {
            if (counts.ContainsKey(element))
                counts[element]++;
            else
                counts.Add(element, 1);
        }

        // Loop through the counts of each element and find the 
        // element that occurred most often
        int max = 0;

        foreach (KeyValuePair<T, int> count in counts)
        {
            if (count.Value > max)
            {
                // Update the mode
                mode = count.Key;
                max = count.Value;
            }
        }
    }

    return mode;
}
Ted
  • 3,212
  • 25
  • 20
  • 1
    While the idea is there, LINQ makes all of this much shorter. – Candide Jul 17 '13 at 15:31
  • 1
    I have found that Linq is good at making your coding shorter, but frequently does not have good performance. It typically uses a simple brute force O(N) solution, if not more. Whereas, taking the time to get a dictionary involved will usually make things much faster. – Ted Jul 17 '13 at 15:40