1

I'm querying from a single view to retrieve a list of indicators. All the properties of an indicator can be retrieved from that single view.

Here's the code:

data = new DataContext();

var core = from item in data.Items
           where countryIDs.Contains(item.CountryId)
           && indicatorIDs.Contains(item.IndicatorId)
           orderby item.Indicator
           select item;

var x = from item in core.Distinct()
        group item by new { item.IndicatorId, item.Indicator }
            into indicator
            select new
            {
                IndicatorID = indicator.Key.IndicatorId,
                IndicatorDescription = indicator.Key.Indicator,
                Genders = from g in core
                          where g.Gender != null
                          && g.IndicatorId == indicator.Key.IndicatorId
                          select new Gender
                          {
                              GenderID = g.GenderId,
                              GenderDescription = g.Gender
                          },
                HasGender = (from g in core
                             where g.Gender != null
                             && g.IndicatorId == indicator.Key.IndicatorId
                             select g.GenderId).Count() > 0,
                AreaTypes = from rat in core
                                     where rat.IndicatorId == indicator.Key.IndicatorId
                                     && rat.AreaType != null
                                     select new AreaType
                                     {
                                         AreaTypeId = rat.AreaTypeId,
                                         AreaDescription = rat.AreaType
                                     },
                HasAreaType = (from rat in core
                                        where rat.IndicatorId == indicator.Key.IndicatorId
                                        && rat.AreaType != null
                                        select rat.AreaTypeId).Count() > 0,
                Sectors = from s in core
                          where s.IndicatorId == indicator.Key.IndicatorId
                          && s.Sector != null
                          select new Sector
                          {
                              SectorID = s.SectorId,
                              Title = s.Sector
                          },
                HasSector = (from s in core
                             where s.IndicatorId == indicator.Key.IndicatorId
                             && s.Sector != null
                             select s.SectorId).Count() > 0
            };

List<Indicator> indicators = new List<Indicator>();
Indicator i = new Indicator();
foreach (var item in x)
{
    i = new Indicator()
    {
        IndicatorID = item.IndicatorID,
        IndicatorDescription = item.IndicatorDescription,
        Genders = item.Genders.ToList(),
        AreaTypes = item.AreaTypes.ToList(),
        Sectors = item.Sectors.ToList(),
        HasGender = item.HasGender,
        HasAreaType = item.HasAreaType,
        HasSector = item.HasSector
    };
    indicators.Add(i);
}
return indicators;

It slows down when it reaches the foreach loop, when x is transformed. Is there any way to make this query convert to a list faster? Thank you.

dork
  • 4,396
  • 2
  • 28
  • 56

3 Answers3

4

It looks like you're doing a lot of unnecessary nested querying.

Your core query is doing some relatively expensive filtering and sorting before returning items. It's best to only execute this query once.

However, you're performing six unnecessary joins back on this query.

Your query Genders, for example, is requerying core an only keeping the items that have the same IndicatorId that you already grouped by! If I can assume that item.Indicator is one-to-one on item.IndicatorId then your group indicator already contains this subset.

You're querying for AreaTypes & Sectors in the same way.

Now each of HasGender, HasAreaType, & HasSector each repeat the above queries and force a .Count() on each of them only to check if the value is greater than zero. This is a waste since .Any() will check for at least one value much more cheaply for you.

Now to test how many times the core query is being accessed I created this test code:

var countryIDs = Enumerable.Range(0, 100).ToArray();
var indicatorIDs = Enumerable.Range(0, 100).ToArray();

data.Items.AddRange(
    Enumerable
        .Range(0, 100)
        .Select(n =>
            new Item()
            {
                CountryId = n,
                IndicatorId = n,
                Indicator = "Indicator",
                GenderId = n,
                Gender = "Gender",
                AreaTypeId = n,
                AreaType = "Area",
                SectorId = n,
                Sector = "Sector",
            }));

I modified core to look like this:

var counter = 0;
var core =
    (from item in data.Items
    where countryIDs.Contains(item.CountryId)
        && indicatorIDs.Contains(item.IndicatorId)
    orderby item.Indicator
    select item).Do(_ => counter++);

The Do operator is from the Reactive Extensions System.Interactive assembly.

Running your code I got the following result:

counter == 60100

Since I have put 100 items in the collection this tells me that your query is invoking a new execution of core 601 times!

This can be changed to execute core once quite easily.

First I modified core to look like this:

var query =
    from item in data.Items
    where countryIDs.Contains(item.CountryId)
        && indicatorIDs.Contains(item.IndicatorId)
    orderby item.Indicator
    select item;

var core = query.ToArray();

The .ToArray() brings the results of the query into memory.

The x query was then modified to look like this:

var x =
    from item in core.Distinct()
    group item by new
    {
        item.IndicatorId,
        item.Indicator
    } into indicator
    let Genders = (
        from g in indicator
        where g.Gender != null
        select new Gender
        {
            GenderID = g.GenderId,
            GenderDescription = g.Gender,
        }).ToList()
    let AreaTypes = (
        from rat in indicator
        where rat.AreaType != null
        select new AreaType
        {
            AreaTypeId = rat.AreaTypeId,
            AreaDescription = rat.AreaType,
        }).ToList()
    let Sectors = (
        from s in indicator
        where s.Sector != null
        select new Sector
        {
            SectorID = s.SectorId,
            Title = s.Sector,
        }).ToList()
    select new Indicator()
    {
        IndicatorID = indicator.Key.IndicatorId,
        IndicatorDescription = indicator.Key.Indicator,
        Genders = Genders,
        AreaTypes = AreaTypes,
        Sectors = Sectors,
        HasGender = Genders.Any(),
        HasAreaType = AreaTypes.Any(),
        HasSector = Sectors.Any(),
    };

Notice that I am computing each of Genders, AreaTypes, & Sectors once only and creating each as a list. This allowed me to change x to immediately produce instances of Indicator.

Now the final creation of the indicators list was simple:

var indicators = x.ToList();

When I used my sample data on this method my result was this:

counter == 100

This means that this query only hit the original core query once!

I then checked how the nesting behaves when I increased the original sample data to 1,000 items - I got a single hit with the new code and 6,001 hits with the original code - and it went much, much slower.

Remember that LINQ is lazily computed so execution doesn't occur where you define your query, but where you execute it.

So the advice here is that, memory permitting, you should execute your query as soon as practicable to bring your data into memory and then perform your computations once and only once.

Enigmativity
  • 113,464
  • 11
  • 89
  • 172
  • i'm awestruck! this is probably the most expert level answer that i ever read on SO! +1 ... and i'm still starring with awe on your answer. – Mare Infinitus Jun 26 '12 at 08:18
0

For starters, change every Count() > 0 into an Any() method, a Count will force a full scan of the table you are querying.

If this doesnt give you the performance gain you want, try to rewrite your query. I think performance will be higher if you first project your data into your anonymous type, and then group by that anonymous type.

pietervp
  • 225
  • 1
  • 7
0

You have some where clauses in your query (e.g. where s.IndicatorId == indicator.Key.IndicatorId)

Try to use Join Syntax here instead, this will make it faster. i.e. core join indicator in your case. something like

Your version of

from g in core 
where g.Gender != null && g.IndicatorId == indicator.Key.IndicatorId

will get something like this

From g In core Join indi In indicator 
on g.IndicatorId Equals indi.Key.IndicatorId

Why is LINQ JOIN so much faster than linking with WHERE?

Community
  • 1
  • 1
Mare Infinitus
  • 8,024
  • 8
  • 64
  • 113
  • Hey, mare. I tried using join instead of where but it was slower. Processing time was 6750ms for the join and 1765.6476ms for the where – dork Jun 26 '12 at 06:39
  • Can you provide how large those datasets that you are linqing are? how does your join'ed linq look like? – Mare Infinitus Jun 26 '12 at 07:40