0

I'm trying to get data deep in my informational object out of a list of library objects to which I have attached them. The two solutions I have both seem very inefficient. Is there any way to reduce this to a single OfType call without the linq query being the longer variant?

using System;
using System.Collections.Generic;
using System.Linq;

namespace LinqQueries
{

    // Test the linq queries
    public class Test
    {
        public void TestIt()
        {
            List<ThirdParty> As = new List<ThirdParty>();

            // This is nearly the query I want to run, find A and C where B 
            // and C match criteria
            var cData = from a in As
                        from b in a.myObjects.OfType<MyInfo>()
                        where b.someProp == 1
                        from c in b.cs
                        where c.data == 1
                        select new {a, c};

            // This treats A and B as the same object, which is what I
            // really want, but it calls two sub-queries under the hood, 
            // which seems less efficient 
            var cDataShorter = from a in As
                               from c in a.GetCs()
                               where a.GetMyProp() == 1
                               where c.data == 1
                               select new { a, c };
        }
    }

    // library class I can't change
    public class ThirdParty
    {
        // Generic list of objects I can put my info object in
        public List<Object> myObjects;
    }

    // my info class that I add to ThirdParty
    public class MyInfo
    {
        public List<C> cs;
        public int someProp;
    }

    // My extension method for A to simplify some things.
    static public class MyExtentionOfThirdPartyClass
    {
        // Get the first MyInfo in ThirdParty
        public static MyInfo GetB(this ThirdParty a)
        {
            return (from b in a.myObjects.OfType<MyInfo>()
                    select b).FirstOrDefault();
        }

        // more hidden linq to slow things down...
        public static int GetMyProp(this ThirdParty a)
        {
            return a.GetB().someProp;
        }

        // get the list of cs with hidden linq
        public static List<C> GetCs(this ThirdParty a)
        {
            return a.GetB().cs;
        }
    }

    // fairly generic object with data in it
    public class C
    {
        public int data;
    }
}
Denise Skidmore
  • 2,286
  • 22
  • 51
  • 1
    Run both the queries and time them with a stopwatch. Be careful about deferred execution. – P.Brian.Mackey Mar 07 '13 at 21:41
  • LINQ is optimized for readability and ease of development - not for execution speed. IF there is a performance problem, try to replace LINQ with some loops. Do some timing to verify the speed and write a comment above the loop to explain what it is doing an why. – DasKrümelmonster Mar 07 '13 at 21:49
  • 1
    @DasKrümelmonster Given what operations are being performed here my guess is that the overhead applied by LINQ is going to be small enough to ignore, since the true "work" is non-trivial. It's when the true "work" to be done is really fast, and there is a lot of items, that the LINQ overhead begins to be noticed. You'd need to profile to be sure, but odds are it's not enough to matter. – Servy Mar 07 '13 at 21:50
  • What makes them seem inefficient? – Eric Lippert Mar 07 '13 at 22:57
  • So I'm just being overly concerned about efficiency? I recognize LINQ is a little slow, it just seems daft to run the same subquery twice. – Denise Skidmore Mar 07 '13 at 23:17
  • My husband comments that LINQ is optimized enough that it may not actually run the subquery twice, it may cache the results. – Denise Skidmore Mar 08 '13 at 14:40
  • 1000 Third Party objects, 1 MyObject each, 1000 c each, all results match criteria, first query is twice as fast. If no MyObjects match the criteria, Query 1 is two orders of magnitude faster. However, if you have multiple MyObjects, the efficiency reverses, 100 ThirdParty, 100 MyObjects each, 100 C each, all results matching, the second query is two orders of magnitude faster than the first. No MyObjects matching, the first comes out faster again. – Denise Skidmore Mar 08 '13 at 15:16

2 Answers2

1

If you are saying your cDataShorter is producing a correct result, then you can rewrite it like this:

As.SelectMany(a => a.myObjects, (aa, mo) => new R {Tp = aa, Mi = mo as MyInfo})
  .Where(r => r.Mi != null && r.Mi.someProp == 1)
  //.Distinct(new Comparer<R>((r1, r2) => r1.Tp.Equals(r2.Tp))) 
  // If you need only one (first) MyInfo from a ThirdParty 
  // You don't need R if you're not going to use Distinct, just use an anonymous
  .SelectMany(r => r.Mi.cs, (rr, c) => new {a = rr.Tp, c})
  .Where(ao => ao.c.data == 1)      

public class R {
    public ThirdParty Tp;
    public MyInfo Mi;
}

For simplicity, Comparer is from there

Community
  • 1
  • 1
aush
  • 2,108
  • 1
  • 14
  • 24
1

Unfortunately the answer is "It Depends". I had to write the query both ways and do timing runs on it.

1000 Third Party objects, 1 MyObject each, 1000 c each, all results match criteria, first query is twice as fast. If no MyObjects match the criteria, Query 1 is two orders of magnitude faster. However, if you have multiple MyObjects, the efficiency reverses, 100 ThirdParty, 100 MyObjects each, 100 C each, all results matching, the second query is two orders of magnitude faster than the first. No MyObjects matching, the first comes out faster again.

I actually ended up implementing the slower solution because it made the code cleaner and the performance of the slower query was not all that bad.

Denise Skidmore
  • 2,286
  • 22
  • 51