2

I've been using Stopwatch and it looks like the below query is very expensive in terms of performance, even though what I already have below I find most optimal based on various reading (change foreach loop with for, use arrays instead of collection, using anonymous type not to take the whole table from DB). Is there a way to make it faster? I need to fill the prices array, which needs to be nullable. I'm not sure if I'm missing something?

public float?[] getPricesOfGivenProducts(string[] lookupProducts)
{
    var idsAndPrices = from r in myReadings select 
                       new { ProductId = r.ProductId, Price = r.Price };

    float?[] prices = new float?[lookupProducts.Length];
    for(int i=0;i<lookupProducts.Length;i++)
    {
        string id = lookupProducts[i];
        if (idsAndPrices.Any(r => r.ProductId == id))
        {
            prices[i] = idsAndPrices.Where(p => p.ProductId == id)
            .Select(a=>a.Price).FirstOrDefault();
        }
        else
        {
            prices[i] = null;
        }
    }
    return prices;
}
Turo
  • 1,537
  • 2
  • 21
  • 42

5 Answers5

4

It's likely every time you call idsAndPrices.Any(r => r.ProductId == id), you are hitting the database, because you haven't materialized the result (.ToList() would somewhat fix it). That's probably the main cause of the bad performance. However, simply loading it all into memory still means you're searching the list for a productID every time (twice per product, in fact).

Use a Dictionary when you're trying to do lookups.

public float?[] getPricesOfGivenProducts(string[] lookupProducts)
{
    var idsAndPrices = myReadings.ToDictionary(r => r.ProductId, r => r.Price);

    float?[] prices = new float?[lookupProducts.Length];

    for (int i = 0; i < lookupProducts.Length; i++)
    {
        string id = lookupProducts[i];
        if (idsAndPrices.ContainsKey(id))
        {
            prices[i] = idsAndPrices[id];
        }
        else
        {
            prices[i] = null;
        }
    }
    return prices;
}

To improve this further, we can identify that we only care about products passed to us in the array. So let's not load the entire database:

var idsAndPrices = myReadings
                       .Where(r => lookupProducts.Contains(r.ProductId))
                       .ToDictionary(r => r.ProductId, r => r.Price);

Now, we might want to avoid the 'return null price if we can't find the product' scenario. Perhaps the validity of the product id should be handled elsewhere. In that case, we can make the method a lot simpler (and we won't have to rely on having the array in order, either):

public Dictionary<string, float> getPricesOfGivenProducts(string[] lookupProducts)
{
    return myReadings
               .Where(r => lookupProducts.Contains(r.ProductId))
               .ToDictionary(r => r.ProductId, r => r.Price);
}

And a note unrelated to performance, you should use decimal for money

Community
  • 1
  • 1
Rob
  • 26,989
  • 16
  • 82
  • 98
  • thank you, the ToList() did the job for me and ChrisMantle indicated where to place it. I upvote your answer though, thank you for elaborate answer. – Turo Jun 04 '16 at 15:19
  • @Turo unless you have a very small number of products, this is more efficient than your solution as the lookup is constant time. Your lookup potentially has to search through *all* the products on each loop iteration *twice*. – Charles Mager Jun 04 '16 at 16:28
2

Assuming that idsAndPrices is an IEnumerable<T>, you should make it's initialization:

var idsAndPrices = (from r in myReadings select 
                   new { ProductId = r.ProductId, Price = r.Price })
                   .ToList();

It's likely that the calls to:

idsAndPrices.Any(r => r.ProductId == id)

and:

idsAndPrices.Where(p => p.ProductId == id)

..are causing the IEnumerable<T> to be evaluated every time it's called.

Chris Mantle
  • 6,595
  • 3
  • 34
  • 48
  • This works great, to my huge surprise, thank you, I shall accept the answer shortly. By calling ToList() I was actually expecting the opposite, to loose the benefit of working with some sort of abstraction - this makes me rethink my Linq approach. When you say IEnumerable would be evaluated every time is called with the original code - I thought this IEnumerable was already very slim by using the anonymous type? – Turo Jun 04 '16 at 15:10
  • It is slim, yes, and returning a small amount of data is good. However, the `IEnumerable` object created from your LINQ query executes a database query every time something begins enumerating it - so you can see how the `Any` and `Where` LINQ calls made subsequently cause a significant performance hit over making the database query once and caching the results in memory. – Chris Mantle Jun 07 '16 at 11:00
2

Based on

using anonymous type not to take the whole table from DB

I assume myReadings is the database table and

var idsAndPrices = 
    from r in myReadings
    select new { ProductId = r.ProductId, Price = r.Price };

is the database query.

Your implementation is far from optimal (I would rather say quite inefficient) because the above query is executed twice per each element of lookupProducts array - idsAndPrices.Any(...) and idsAndPrices.Where(...) statements.

The optimal way I see is to filter as much as possible the database query, and then use the most efficient LINQ to Objects method for correlating two in memory sequences - join, in your case left outer join:

var dbQuery = 
    from r in myReadings
    where lookupProducts.Contains(r.ProductId)
    select new { ProductId = r.ProductId, Price = r.Price };
var query =
    from p in lookupProducts
    join r in dbQuery on p equals r.ProductId into rGroup
    from r in rGroup.DefaultIfEmpty().Take(1)
    select r?.Price;
var result = query.ToArray();
Ivan Stoev
  • 195,425
  • 15
  • 312
  • 343
0

The Any and FirstOrDefault are O(n) and redundant. You can get a 50% speed up just by removing theAll call. FirstOrDefault will give you back a null, so use it to get a product object (remove the Select). If you want to really speed it up you should just loop through the products and check if prices[p.ProductId] != null before setting prices[p.ProductId] = p.Price.

Kyle Sletten
  • 5,365
  • 2
  • 26
  • 39
0

bit of extra code code there

var idsAndPrices = (from r in myReadings select 
               new { ProductId = r.ProductId, Price = r.Price })
               .ToList();
for(int i=0;i<lookupProducts.Length;i++)
{
    string id = lookupProducts[i];
    prices[i] = idsAndPrices.FirstOrDefault(p => p.ProductId == id);
}

better yet

Dictionary<Int, Float?> dp = new Dictionary<Int, Float?>();
foreach(var reading in myReadings) 
    dp.add(r.ProductId, r.Price);
for(int i=0;i<lookupProducts.Length;i++)
{
    string id = lookupProducts[i];
    if(dp.Contains(id)
        prices[i] = dp[id];
    else 
        prices[i] = null;
}
paparazzo
  • 44,497
  • 23
  • 105
  • 176