4

Right now, I have a class called TrainingPlan that looks like this:

public class TrainingPlan
{
   public int WorkgroupId { get; set; }
   public int AreaId { get; set; }
}

I'm given an array of these instances, and need to load the matching training plans from the database. The WorkgroupId and AreaId basically form a compound key. What I'm doing now is looping through each TrainingPlan like so:

foreach (TrainingPlan plan in plans)
   LoadPlan(pid, plan.AreaId, plan.WorkgroupId);

Then, LoadPlan has a LINQ query to load the individual plan:

var q = from tp in context.TPM_TRAININGPLAN.Include("TPM_TRAININGPLANSOLUTIONS")
        where tp.PROJECTID == pid && tp.AREAID == areaid &&
              tp.WORKGROUPID == workgroupid
              select tp;

return q.FirstOrDefault();

The Problem:

This works, however it's very slow for a large array of plans. I believe this could be much faster if I could perform a single LINQ query to load in every TPM_TRAININGPLAN at once.

My Question:

Given an array of TrainingPlan objects, how can I load every matching WorkgroupId/AreaId combination at once? This query should translate into similar SQL syntax:

SELECT * FROM TPM_TRAININGPLANS
WHERE (AREAID, WORKGROUPID) IN ((1, 2), (3, 4), (5, 6), (7, 8));
Mike Christensen
  • 88,082
  • 50
  • 208
  • 326
  • Just to make sure, you have indexes set up *properly* in the database? – Darko Kenda Apr 30 '13 at 19:46
  • You can't construct an IN clause like you want with LINQ to SQL, because it's not valid SQL. Instead, you need to construct a query of the form "SELECT * FROM TPM_TRAININGPLANS WHERE ((AREAID = 1 AND WORKGROUPID = 2) OR (AREAID = 3 AND WORKGROUPID = 4))". Doing that with LINQ will be a mess of expression trees, but it's doable. – kevingessner Apr 30 '13 at 19:52
  • @Carko - Seems that way. When I do an `EXECUTE EXPLAIN` on a similar query, it says it's using an index. – Mike Christensen Apr 30 '13 at 19:52
  • @kevingessner - Works fine on my database, but yea that SQL was for demonstration purposes only. It might have to be a big mess of `OR` clauses. – Mike Christensen Apr 30 '13 at 19:55
  • Only a comment. Bit shift the two int into a single int64. But I don't know EF well enough to know it that lets you do it in one query. – paparazzo Apr 30 '13 at 20:05
  • @Blam - I'm pretty sure then I'd have to create a column on the SQL table with the bit-shifted combined values, then index that. – Mike Christensen Apr 30 '13 at 20:08
  • This seems like a problem where processing plans as a queue with multiple threads would speed up the process. Although that doesn't really solve the problem directly. Another option assuming the table stays relatively constant and it's not too big would be to cache it in memory as a static class. – cgotberg Apr 30 '13 at 20:12
  • Yes you would have to create column. – paparazzo Apr 30 '13 at 20:23
  • I would try experimenting with a join with a table created from the values for TrainingPlan. Just an idea starter – gabnaim Apr 30 '13 at 20:41
  • See: http://stackoverflow.com/questions/857973/linq-to-entities-sql-in-clause – Kittoes0124 Apr 30 '13 at 20:53
  • @Kittoes - That only works for single values, not *pairs* of values. – Mike Christensen Apr 30 '13 at 20:55
  • So packing the two values into a single column does fix this? Why are you opposed to a single packed value? – paparazzo May 01 '13 at 14:16

2 Answers2

1

I've used Contains to run a bulk filter similar to where-in. I setup a rough approximation of your scenario. The single select queries actually ran quicker than Contains did. I recommend running a similar test on your end with the DB tied in to see how your results wind up. Ideally see how it scales too. I'm running .NET 4.0 in visual studio 2012. I jammed in ToList() calls to push past potential lazy loading problems.

public class TrainingPlan
{
    public int WorkgroupId { get; set; }
    public int AreaId { get; set; }

    public TrainingPlan(int workGroupId, int areaId)
    {
        WorkgroupId = workGroupId;
        AreaId = areaId;
    }    
}

public class TrainingPlanComparer : IEqualityComparer<TrainingPlan>
{
    public bool Equals(TrainingPlan x, TrainingPlan y)
    {
        //Check whether the compared objects reference the same data. 
        if (x.WorkgroupId == y.WorkgroupId && x.AreaId == y.AreaId) 
            return true;

        return false;                        
    }

    public int GetHashCode(TrainingPlan trainingPlan)
    {            
        if (ReferenceEquals(trainingPlan, null)) 
            return 0;

        int wgHash = trainingPlan.WorkgroupId.GetHashCode();
        int aHash = trainingPlan.AreaId.GetHashCode();

        return wgHash ^ aHash;
    }
}


internal class Class1
{
    private static void Main()
    {
        var plans = new List<TrainingPlan>
            {
                new TrainingPlan(1, 2),
                new TrainingPlan(1, 3),
                new TrainingPlan(2, 1),
                new TrainingPlan(2, 2)
            };

        var filter = new List<TrainingPlan>
            {
                new TrainingPlan(1, 2),
                new TrainingPlan(1, 3),
            };

        Stopwatch resultTimer1 = new Stopwatch();
        resultTimer1.Start();
        var results = plans.Where(plan => filter.Contains(plan, new TrainingPlanComparer())).ToList();
        resultTimer1.Stop();

        Console.WriteLine("Elapsed Time for filtered result {0}", resultTimer1.Elapsed);

        Console.WriteLine("Result count: {0}",results.Count());

        foreach (var item in results)
        {
            Console.WriteLine("WorkGroup: {0}, Area: {1}",item.WorkgroupId, item.AreaId);
        }

        resultTimer1.Reset();

        resultTimer1.Start();
        var result1 = plans.Where(p => p.AreaId == filter[0].AreaId && p.WorkgroupId == filter[0].WorkgroupId).ToList();
        var result2 = plans.Where(p => p.AreaId == filter[1].AreaId && p.WorkgroupId == filter[1].WorkgroupId).ToList();
        resultTimer1.Stop();

        Console.WriteLine("Elapsed time for single query result: {0}",resultTimer1.Elapsed);//single query is faster

        Console.ReadLine();
    }
}
P.Brian.Mackey
  • 43,228
  • 68
  • 238
  • 348
  • I'm pretty sure LINQ won't be able to translate that to a SQL statement. Either that, or it would have to load all plans (the table has about 200,000+ rows) to perform the filter. – Mike Christensen Apr 30 '13 at 21:06
  • @MikeChristensen - I'm not how LINQ to SQL or whatever engine your using would deal with the translation. There are tools that can show you if that's a requirement. Personally, I would test the performance rather than try to tackle reverse engineering the tools or otherwise deep diving into implementation detail. – P.Brian.Mackey Apr 30 '13 at 21:10
  • Seems to me that if you want A SQL query it would be easier to write a stored proc/inline sql than to use a tool to generate it. – P.Brian.Mackey Apr 30 '13 at 21:22
  • Sure, but then you'd have to work with data readers rather than entities. It'd be a total re-design... and a hack. – Mike Christensen Apr 30 '13 at 21:24
  • After more analysis, turns out the loading portion actually *wasn't* the bottleneck. There were actually all sorts of weird intermittent issues, which I traced down to the fact that the 32bit Oracle XE is not compatible with 64bit Windows 7 and the exact combination of things I was doing was exposing all sorts of strange behavior. I installed Oracle 64bit Standard, and things are actually running much more smoothly. Thanks for doing this testing though! – Mike Christensen May 14 '13 at 04:53
0

It seems to me that using Intersect() may get this done the way that you want. But, I don't have an environment set up to test this myself.

var q = (from tp in context.TPM_TRAININGPLAN.Include("TPM_TRAININGPLANSOLUTIONS")
        where pid == tp.PROJECTID
        select tp)
        .Intersect
        (from tp in context.TPM_TRAININGPLAN.Include("TPM_TRAININGPLANSOLUTIONS")
        where plans.Any(p => p.AreaID == tp.AREAID)
        select tp)
        .Intersect
        (from tp in context.TPM_TRAININGPLAN.Include("TPM_TRAININGPLANSOLUTIONS")
        where plans.Any(p => p.WorkgroupId == tp.WORKGROUPID)
        select tp);

My only concern might be that Intersect could cause it to load more records in memory than you would want, but I'm unable to test to confirm if that's the case.

Chris
  • 707
  • 5
  • 13