2

I have the following query that finds price reductions for specific listings in our real estate database. I want to know if this query can be done more efficiently. The end goal is to get the last 2 price changes for listings where the most recent price change is less than the one before it. Here is my query:

var PriceDrops = idxContext.ListingPriceChanges
     .Where(a => a.DateAdded >= LastRunTime && ListingIDsChunk.Contains(a.ListingID))
     .GroupBy(g => g.ListingID)
     .Where(g => g.Count() > 1 && g.OrderByDescending(a => a.DateAdded).FirstOrDefault().ListPrice < g.OrderByDescending(a => a.DateAdded).Skip(1).FirstOrDefault().ListPrice)
     .SelectMany(g => g.OrderByDescending(a => a.DateAdded).Take(2)).ToList();

This query works, I just want to know if it can be done more efficiently.

Here is some more info as requested:

ListingIDChunk size is 2000 id's at a time

The time it takes to execute: 20.4172256 seconds (based off stopwatch)

The SQL it generates is:

SELECT 
[Limit3].[ListingPriceChangeID] AS [ListingPriceChangeID], 
[Limit3].[ListingID] AS [ListingID], 
[Limit3].[ListPrice] AS [ListPrice], 
[Limit3].[PriceChangeDate] AS [PriceChangeDate], 
[Limit3].[DateAdded] AS [DateAdded]
FROM   (SELECT [Project7].[ListingID] AS [ListingID]
    FROM ( SELECT 
        [Project6].[ListingID] AS [ListingID], 
        [Project6].[ListPrice] AS [ListPrice], 
        [Project6].[ListPrice1] AS [ListPrice1], 
        (SELECT 
            COUNT(1) AS [A1]
            FROM [dbo].[ListingPriceChanges] AS [Extent4]
            WHERE ([Extent4].[DateAdded] >= @p__linq__0) AND ([Extent4].[ListingID] IN (REMOVED-2000-IDs)) AND ([Project6].[ListingID] = [Extent4].[ListingID])) AS [C1]
        FROM ( SELECT 
            [Project4].[ListingID] AS [ListingID], 
            [Project4].[ListPrice] AS [ListPrice], 
            [Limit2].[ListPrice] AS [ListPrice1]
            FROM   (SELECT 
                [Project2].[ListingID] AS [ListingID], 
                [Limit1].[ListPrice] AS [ListPrice]
                FROM   (SELECT 
                    @p__linq__0 AS [p__linq__0], 
                    [Distinct1].[ListingID] AS [ListingID]
                    FROM ( SELECT DISTINCT 
                        [Extent1].[ListingID] AS [ListingID]
                        FROM [dbo].[ListingPriceChanges] AS [Extent1]
                        WHERE ([Extent1].[DateAdded] >= @p__linq__0) AND ([Extent1].[ListingID] IN (REMOVED-2000-IDs))
                    )  AS [Distinct1] ) AS [Project2]
                OUTER APPLY  (SELECT TOP (1) [Project3].[ListPrice] AS [ListPrice]
                    FROM ( SELECT 
                        [Extent2].[ListPrice] AS [ListPrice], 
                        [Extent2].[DateAdded] AS [DateAdded]
                        FROM [dbo].[ListingPriceChanges] AS [Extent2]
                        WHERE ([Extent2].[DateAdded] >= @p__linq__0) AND ([Extent2].[ListingID] IN (REMOVED-2000-IDs)) AND ([Project2].[ListingID] = [Extent2].[ListingID])
                    )  AS [Project3]
                    ORDER BY [Project3].[DateAdded] DESC ) AS [Limit1] ) AS [Project4]
            OUTER APPLY  (SELECT TOP (1) [Project5].[ListPrice] AS [ListPrice], [Project5].[DateAdded] AS [DateAdded]
                FROM ( SELECT [Project5].[ListPrice] AS [ListPrice], [Project5].[DateAdded] AS [DateAdded], row_number() OVER (ORDER BY [Project5].[DateAdded] DESC) AS [row_number]
                    FROM ( SELECT 
                        [Extent3].[ListPrice] AS [ListPrice], 
                        [Extent3].[DateAdded] AS [DateAdded]
                        FROM [dbo].[ListingPriceChanges] AS [Extent3]
                        WHERE ([Extent3].[DateAdded] >= @p__linq__0) AND ([Extent3].[ListingID] IN (REMOVED-2000-IDs)) AND ([Project4].[ListingID] = [Extent3].[ListingID])
                    )  AS [Project5]
                )  AS [Project5]
                WHERE [Project5].[row_number] > 1
                ORDER BY [Project5].[DateAdded] DESC ) AS [Limit2]
        )  AS [Project6]
    )  AS [Project7]
    WHERE ([Project7].[C1] > 1) AND ([Project7].[ListPrice] < [Project7].[ListPrice1]) ) AS [Filter5]
CROSS APPLY  (SELECT TOP (2) [Project8].[ListingPriceChangeID] AS [ListingPriceChangeID], [Project8].[ListingID] AS [ListingID], [Project8].[ListPrice] AS [ListPrice], [Project8].[PriceChangeDate] AS [PriceChangeDate], [Project8].[DateAdded] AS [DateAdded]
    FROM ( SELECT 
        [Extent5].[ListingPriceChangeID] AS [ListingPriceChangeID], 
        [Extent5].[ListingID] AS [ListingID], 
        [Extent5].[ListPrice] AS [ListPrice], 
        [Extent5].[PriceChangeDate] AS [PriceChangeDate], 
        [Extent5].[DateAdded] AS [DateAdded]
        FROM [dbo].[ListingPriceChanges] AS [Extent5]
        WHERE ([Extent5].[DateAdded] >= @p__linq__0) AND ([Extent5].[ListingID] IN (REMOVED-2000-IDs)) AND ([Filter5].[ListingID] = [Extent5].[ListingID])
    )  AS [Project8]
    ORDER BY [Project8].[DateAdded] DESC ) AS [Limit3]

There are currently 3,239,193 records in the ListingPriceChanges table.

If you need more info, please let me know. In the query above, I replaced the 2000 id's with the text REMOVED-2000-IDs

I'm using EF 5.0 and .NET 4.5

The ListingPriceChanges table is defined as:

[ListingPriceChangeID] [int] IDENTITY(1,1) NOT NULL,
[ListingID] [int] NOT NULL,
[ListPrice] [money] NOT NULL,
[PriceChangeDate] [datetime2](7) NULL,
[DateAdded] [datetime2](7) NOT NULL

The field ListingID is a FK reference to the Listings table. There is also an index for the table on the ListingID field that includes PriceChangeDate.

Ricketts
  • 5,025
  • 4
  • 35
  • 48
  • 3
    That's pretty much impossible to answer with the information provided. We don't know what SQL it's actually generating. How long that SQL is taking now. What the execution plan is. What the physical data structures are now. And there are more to ask yet. – Mike Perrenoud Dec 03 '13 at 23:57
  • 2
    Short answer: write custom SQL or an SPROC. – Robert Harvey Dec 04 '13 at 00:03
  • 2
    @RobertHarvey Except that is not LINQ. This code is not even complex, and there is no reason to fallback to manual SQL writing. The OP didn't even state he has a performance problem. He's just asking how can this query be improved. – Federico Berasategui Dec 04 '13 at 00:06
  • @Ricketts do you even have a performance issue here, or asking out of curiosity? What version of EF and .Net are you targetting? – Federico Berasategui Dec 04 '13 at 00:07
  • @HighCore When it is a "normal" check, no there is no performance issues. When stressed (thousands of listings to check), then as you see it takes 20 seconds to check 2000 listings. The same query checking 134 listings (more realistic) takes 0.2354799 seconds (again from stopwatch) – Ricketts Dec 04 '13 at 00:16
  • Hah, EF, Efficient... – Phill Dec 04 '13 at 00:24
  • I assume ListingID is indexed? Can we see your table definitions? – Robert Harvey Dec 04 '13 at 00:40
  • @RobertHarvey Please see updated question, yes the table does have an index of the `ListingID` field. – Ricketts Dec 04 '13 at 00:48
  • I can see a case for a couple of well-placed `let` statements, particularly for the list prices in those lambda expressions. It's all those subselects that are hammering your performance; if you can coax EF to remove those, you should be good to go. – Robert Harvey Dec 04 '13 at 00:50
  • I figured this line is what consumed most of the processing time `.Where(g => g.Count() > 1 && g.OrderByDescending(a => a.DateAdded).FirstOrDefault().ListPrice < g.OrderByDescending(a => a.DateAdded).Skip(1).FirstOrDefault().ListPrice)` But I don't know how to re-write it, so that is why I'm here. – Ricketts Dec 04 '13 at 00:55
  • 1
    `Contains` is known to be slow (with 2000 elements or more): http://stackoverflow.com/questions/8107439/entity-framework-4-1-most-efficient-way-to-get-multiple-entities-by-primary-key/8108643#8108643 This *could* be a reason for the poor performance. What makes it actually slow isn't the SQL itself but the translation from LINQ into SQL. (And translated queries that have a `Contains` in it are not cached.) You can narrow it down by using `ToString()` instead of `ToList()` at the end of the query which just forces generating SQL without executing it. The question is if the `ToString()` is slow. – Slauma Dec 04 '13 at 19:10
  • @Slauma You are correct in that the translation does take a majority of the time. In my prior example (20 seconds to evaluate 2000 listings), the translation took 14 seconds and the query took 6 seconds. – Ricketts Dec 04 '13 at 21:20

2 Answers2

3

I can't test at the moment, but I believe you could do the following:

var PriceDrops = idxContext.ListingPriceChanges
    .Where(a => a.DateAdded >= LastRunTime && ListingIDsChunk.Contains(a.ListingID))
    .GroupBy(g => g.ListingID)
    .Where(g => g.Count() > 1)
    .Select(g => g.OrderByDescending(a => a.DateAdded).Take(2))
    .Where(g => g.First().ListPrice < g.Skip(1).First().ListPrice)
    .SelectMany(g => g)
    .ToList();

This should evaluate to fewer subqueries, which I believe may help your overall performance. It's also a bit simpler to follow.

Reed Copsey
  • 554,122
  • 78
  • 1,158
  • 1,373
  • This won't work because `.Last()` isn't supported in EF. The only way to get to that 2nd record to compare the `ListPrice` that I can tell is to use `Skip(1)`, however, you can't use `Skip(1)` without first ordering the results. – Ricketts Dec 04 '13 at 21:05
  • @Ricketts I had already ordered them - so you should be able to use Skip(1).First(), then - will edit Note the Select just above athat where... – Reed Copsey Dec 04 '13 at 21:20
  • I believe you are only ordering the individual groups, not the entire collection. I already had tried changing to skip and it throws the following exception: `The method 'Skip' is only supported for sorted input in LINQ to Entities. The method 'OrderBy' must be called before the method 'Skip'.` Also, you can't use `First()` either, you need to change to `FirstOrDefault()`. – Ricketts Dec 04 '13 at 21:27
  • 1
    Custom SQL or an SPROC is starting to look like a better option. – Robert Harvey Dec 04 '13 at 23:01
1

Using a modification to Reed's answer, I was able to drop the average execution time from 20 seconds to 13 seconds on a 2000 listing batch. I used the solution below and dropped the average execution time to around 3 seconds. I needed to call .ToList() after I selected the initial collection but before the ListPrice comparison.

var PriceDrops = idxContext.ListingPriceChanges
        .Where(a => a.DateAdded >= LastRunTime && ListingIDsChunk.Contains(a.ListingID))
    .GroupBy(g => g.ListingID)
    .Where(g => g.Count() > 1)
    .Select(g => g.OrderByDescending(a => a.DateAdded).Take(2))
    .ToList()
    .Where(g => g.First().ListPrice < g.Last().ListPrice)
    .SelectMany(g => g)
    .ToList();
Ricketts
  • 5,025
  • 4
  • 35
  • 48