27

I have a List<List<T>>. How can I count all the elements in this as if it was a single List<T> in the fastest way?

So far I have used

List<int> result = listOfLists
  .SelectMany(list => list)
  .Distinct()
  .ToList().Count;

but this actually creates a list and then counts the element which is not a very good idea.

Fredrik Mörk
  • 155,851
  • 29
  • 291
  • 343
kasperhj
  • 10,052
  • 21
  • 63
  • 106

4 Answers4

24

By using LINQ, I think your code is good with a bit changes that no need for .ToList(), just call Count() extension as the following:

int result = listOfLists.SelectMany(list => list).Distinct().Count();
CodesInChaos
  • 106,488
  • 23
  • 218
  • 262
Homam
  • 23,263
  • 32
  • 111
  • 187
16

I would recommend a simple, nested loop with a HashSet if you need to eliminate duplicates between lists. It combines the SelectMany and Distinct operations into the set insertion logic and should be faster since the HashSet has O(1) lookup time. Internally Distinct() may actually use something similar, but this omits the construction of the single list entirely.

var set = new HashSet<T>();
foreach (var list in listOfLists)
{
    foreach (var item in list)
    {
        set.Add(item);
    }
}
var result = set.Count;
tvanfosson
  • 524,688
  • 99
  • 697
  • 795
  • It should be a bit faster since some delegate calls get eliminated. Distinct already uses a HashSet with pretty much the same logic. And I don't see the you doing the `SelectMany` here. So your current code is quite different. – CodesInChaos Jun 01 '11 at 12:28
  • @Homam -- good catch, it should have been a nested loop. It's basically the same, except no need to construct a combined, internal list to do the distinct over. This saves memory and may improve the time. – tvanfosson Jun 01 '11 at 12:29
  • You can leave out the `Contains`. `HashSet.Add` doesn't complain if the item already exists. – CodesInChaos Jun 01 '11 at 12:30
  • @CodeInChaos - I wish Microsoft would be consistent, a Dictionary would throw an exception -- but then again, I guess in that case you'd be inserting a duplicate *key* not a duplicate *item*. – tvanfosson Jun 01 '11 at 12:33
  • Maybe it can be shorter (and faster?) using `HashSet.UnionWith` – Shurdoof Jun 01 '11 at 12:39
  • 1
    @Shurdoof I'm pretty sure that's be slower. The point of this answer is that it's fast. For good readability one could use Homam's answer. – CodesInChaos Jun 01 '11 at 12:52
12

To count all the elements in all the lists in the list, you could use the aggregating operators:

int count = listOfLists.Sum(l => l.Distinct().Count());
Daniel Hilgarth
  • 171,043
  • 40
  • 335
  • 443
Hasanain
  • 925
  • 8
  • 16
1

I'd like to get the chance to answer this question just to highlight when we should use linq and when a classic for. Unfortunately today people doesn’t care a lot about performance as we got use to work on very powerful computer. Anyway just try the code below and you will discover that Linq is more then 100 times slower than the classic for version. You should use Linq only when the expression you need to write is really complex and you want make it more readable. I didn't spend time to the solution shoed below as I'd like to focus on the performance

public static void Main(string [] arg)
{
    //create the list
    List<List<string>> listOfList = new List<List<string>>()
                                      {
                                          new List<string>()
                                              {
                                                  "1.1","2.2"
                                              }
                                      ,
                                       new List<string>()
                                              {
                                                  "2.1","2.2","2.3"
                                              }
                                      };
    //stopwatch using Linq
    Stopwatch stopwatch=new Stopwatch();
    stopwatch.Start();

    int totalUsingLinq = listOfList.Sum(x => x.Count);

    stopwatch.Stop();
    Console.WriteLine("Using Linq:{0}",stopwatch.Elapsed); //00005713

    int totalUsingFor = 0;
    //stopwatch using classic for 
    stopwatch.Reset();
    stopwatch.Start();
    totalUsingFor = 0;
    for(int i=0;i<listOfList.Count;i++)
    {
       var mainItem = listOfList[i];
        if(mainItem!=null)
        {
            totalUsingFor += mainItem.Count;
        }
    }
    stopwatch.Stop();
    Console.WriteLine("Using for:{0}", stopwatch.Elapsed); //0000010

}

distinct version using for (just for example). In this case I have create a very "bottleneck" function that does the distinct and it is still faster.

 public class Program
    {
      public static void Main(string[] arg)
        {
            //create the list
            List<List<string>> listOfList = new List<List<string>>()
                                      {
                                          new List<string>()
                                              {
                                                  "1.1","2.2","1.1","1.1","2.2","1.1","1.1","2.2","1.1","1.1"
                                              }
                                      ,
                                       new List<string>()
                                              {
                                                  "2.1","2.2","2.3","2.3","1.1","2.2","1.1","1.1","2.2","1.1","1.1","2.2","1.1","1.1","2.2","1.1","1.1","2.2","1.1"
                                              }
                                      };
            //stopwatch using Linq
            Stopwatch stopwatch = new Stopwatch();
            stopwatch.Start();

            int totalUsingLinq = listOfList.Sum(l => l.Distinct().Count());


            stopwatch.Stop();
            Console.WriteLine("Using Linq:{0}", stopwatch.Elapsed); //000012150    
            int totalUsingFor = 0;
            //stopwatch using classic for 
            stopwatch.Reset();
            stopwatch.Start();
            totalUsingFor = 0;
            for (int i = 0; i < listOfList.Count; i++)
            {
                var mainItem = listOfList[i];
                if (mainItem != null)
                {
                    for(int y=0;y<mainItem.Count;y++)
                    {
                      if(mainItem[y]!=null)
                      {
                          totalUsingFor++;
                          NullDuplicateItems(y, ref mainItem);
                      }   
                    }
                }
            }
            stopwatch.Stop();
            Console.WriteLine("Using for:{0}", stopwatch.Elapsed); //0009440
        }

        public static void NullDuplicateItems(int index,ref List<string > list)
        {
            var item = list[index];
            for(int i=index+1;i<list.Count;i++)
            {
                if(list[i]==item)
                {
                    list[i] = null;
                }
            }
        }

    }
Massimiliano Peluso
  • 26,379
  • 6
  • 61
  • 70
  • You're missing the `Distinct` functionality, so it is a completely different problem from the OPs problem. And both examples take perhaps a few microseconds with such small lists as in your example. So unless they are a bottleneck it practice this is one example for using linq over the classic loop. – CodesInChaos Jun 01 '11 at 13:01
  • And when I ran each of them 100'000 times the factor was 40, not >100 – CodesInChaos Jun 01 '11 at 13:07
  • I didn't spend time on the code as I said: even if you add the distinct it will be at least 500 times quicker. In the example I didn't create a huge list because it is just an example but if want give it a go you can try my code againg a list with thousand of items and you will get the same result – Massimiliano Peluso Jun 01 '11 at 13:08
  • it is normal the factor is 40 not >100 because the machine you are running it is multi-tasking so that it can happen window is processing something during your test so you should run it lots of time and then caluclate the averge but the classic for will be always much faster: factor of 40 it a lot! – Massimiliano Peluso Jun 01 '11 at 13:09
  • 1
    That's why you write code that's a performance bottleneck in the classic way. But if it isn't, don't bother. 40 times almost nothing is still almost nothing. – CodesInChaos Jun 01 '11 at 13:11
  • "even if you add the distinct it will be at least 500 times quicker" Proof please? With your small example lists the factor was only 2-3 between Homam's and tvanfosson's code. – CodesInChaos Jun 01 '11 at 13:14
  • It looks like I'm wrong:-) guys what I'm saing it is quite true as if you want the classic for is more "close" the the machine language then Linq so that it is normal it is quicker. I dinn't say Linq it the worst thing you can use(ReSharper won't agree me:-)) but for this kind of task a classic for would be much faster http://www.dotnetscraps.com/dotnetscraps/post/LINQ-Performance-Part-1-LINQ-to-Collection.aspx – Massimiliano Peluso Jun 01 '11 at 13:16
  • The benchmark you linked is broken. The linq code finds all elements in the list, the for code only the first one. I'm sure the results are much closer(i.e. a factor of around 2-3) if the linq query is replaced by `strArray.First(x=>x==strFetch[i])` – CodesInChaos Jun 01 '11 at 13:57
  • +1 for even considering the performance implications of LINQ analysis v procedural code analysis. – Heather Oct 25 '12 at 18:39