2

Little bit of background:

I'm making a small app to demonstrate the use of LINQ, so I'm supposed to use mostly LINQ methods. The app is going to show some info about movies and TV shows and make suggestions based on filters.

I've made three classes: TvShow, Season and Episode. TvShow contains List of Season and Season contains List of Episode. Episode contains its List of Actor which is the cast for that episode. I want to make a method in class TvShow that returns a complete list of cast based on individual episode's list of cast.

I decided to use either Union or Distinct, but I'm not sure which approach is better performance wise, since I believe that's the only reason to pick one over the other in this very example (I know performance is not really an issue on an app this small, but I'd like to know how this would behave on a much larger scale).

Here are both methods:

    public List<Actor> AllCast()
    {
        List<Actor> actors = new List<Actor>();
        foreach (Season s in seasons)
        {
            s.Episodes.ForEach(e => actors.AddRange(e.Cast));
        }
        return actors.Distinct().ToList();
    }

OR

    public List<Actor> AllCast()
    {
        List<Actor> actors = new List<Actor>();
        foreach(Season s in seasons)
        {
            s.Episodes.ForEach(e => actors.AddRange(actors.Union(e.Cast)));
        }
        return actors;
    }

The thoughts I'm having are really: is it better to keep adding several lists to one big list and then going through that giant list and returning only distinct values OR is it better to go through one small and one growing list and compare values to find a union (I assume that's how Union finds its result), and then adding it to an already unique list?

P.S. I'm aware of HashSet, but I'd really like to use LINQ here because that's sort of the purpose of my project.

borrovnica
  • 31
  • 5
  • 1
    Aside from anything else, you can use `SelectMany` to help a lot: `return seasons.SelectMany(s => s.Episodes).SelectMany(e => e.Cast).Distinct().ToList();` Now, do you have enough data that you can *measure* whether that's too slow or not? Can you create such data? – Jon Skeet Sep 19 '17 at 13:43
  • Isn't this a duplicate or at least related question? https://stackoverflow.com/questions/44883301/what-is-the-simplest-way-to-achieve-on-performance-when-creating-the-union-of – Tim Schmelter Sep 19 '17 at 13:45
  • @JonSkeet unfortunately I don't have ANY data at the moment, so I can't test it :/ I plan on adding a couple of data later, just to have a presentation result, but not enough to actually test anything, that's why I was asking here in case someone already researched this. This is mostly a project for myself to do while learning LINQ, it doesn't really have to do with performance, but I wanted to think about those things as well while I'm doing a deeper research, so I'm aware of it in the future – borrovnica Sep 19 '17 at 14:04

2 Answers2

7

Your second approach needs to internally build a new HashSet for each season, comparing the actors in that season with all the actors we've seen before - I'd expect that to be slower than doing a single pass over all actors, putting them into one set to obtain uniqueness.

I'd use SelectMany twice to achieve it all in LINQ though:

public List<Actor> AllCast() =>
    seasons                         // All seasons
       .SelectMany(s => s.Episodes) // All episodes as a flat sequence
       .SelectMany(e => e.Cast)     // All actors as a flat sequence
       .Distinct()                  // Distinct
       .ToList(); 
Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • I didn't realise it uses HashSet in the background, thanks! And I had yet to explore SelectMany, so thank you for pointing it out to me, this is even better since it's pure LINQ! – borrovnica Sep 19 '17 at 14:25
1

Since it's part of your project to use LINQ, why not this pure LINQ apporach:

public List<Actor> AllCast()
{
    List<Actor> actors = seasons
        .SelectMany(season => season.Episodes.SelectMany(episode => episode.Cast))
        .Distinct()
        .ToList();
    return actors;
}

I guess this is also one of the most efficient approaches. It flattens all contained objects into one big sequence and then uses Distinct(uses a set) only once. A non-LINQ approach would be to have nested foreach loops which add the actors into a HashSet<Actor> and then call set.ToList().

Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939
  • Have added my own answer as a slight variant of yours - I'd make two "top-level" SelectMany calls rather than one nested in another. I doubt that it'll be significantly more or less efficient. – Jon Skeet Sep 19 '17 at 14:07
  • I had yet to research SelectMany, thank you for showing me an even better method than the ones I tried above! – borrovnica Sep 19 '17 at 14:26