2

I created an application (ASP.net MVC 4) for a client. This application uses Entity Framework to query the DB (SQL Server 2012).

The client asked me to create a new report for the application. This Report will display all products from their database and show how many were sold (TotalOut) and how many were bought and put in stock (TotalIn). Because this report needs to calculate the values I chose to create a Function on the DB.

To create this function I made use of 3 tables.

  • Products --> Table with all the products sold by the company
  • Storage --> Table with all the different stocks of a product. Storage has 1 Product. Product Has 0 or Many Storages.
  • StorageHistory --> Table with the history of a Storage. When a product is sold or bought , a record will be added to this table. StorageHistory has 1 Storage. Storage has 0 or many StorageHistories.

So I created my function and added it to my EF DB context model (edmx). Using LINQ I add extra queries to this function.

ex. The user is able to query only the products of a specific manufacturer.

The Product Table contains almost 15k records. I added paging so the user can only see 15 records at once. But the paging is causing a lot of performance problems.

By default I Order the products by their Id. The Id is the primary key and is of type String. This Id is the productCode an is always in the following format (ABC123456).

Problem:

When I Order By the Id (ASC or DESC) the query takes 13 seconds to execute. But when I change the Order By to any other column (ex. Name) the query takes less than a second to execute.

I already checked my LINQ to EF code to make sure everything is queried on DB level. These are the two SQL queries generated when using Skip() and Take(). When excuting this directly on the DB i get the same result.

When Ordering on Id --> 12 seconds execution time

SELECT TOP (15) 
[Project1].[C1] AS [C1], 
[Project1].[Id] AS [Id], 
[Project1].[Name] AS [Name], 
[Project1].[SellingPrice] AS [SellingPrice], 
[Project1].[TotalIn] AS [TotalIn], 
[Project1].[TotalOut] AS [TotalOut]
FROM ( SELECT [Project1].[Id] AS [Id], [Project1].[Name] AS [Name], [Project1].[SellingPrice] AS [SellingPrice], [Project1].[TotalIn] AS [TotalIn], [Project1].[TotalOut] AS [TotalOut], [Project1].[C1] AS [C1], row_number() OVER (ORDER BY [Project1].[Id] ASC) AS [row_number]
    FROM ( SELECT 
        [Extent1].[Id] AS [Id], 
        [Extent1].[Name] AS [Name], 
        [Extent1].[SellingPrice] AS [SellingPrice], 
        [Extent1].[TotalIn] AS [TotalIn], 
        [Extent1].[TotalOut] AS [TotalOut], 
        1 AS [C1]
        FROM [dbo].[GetProductStatistics](NULL, NULL) AS [Extent1]
    )  AS [Project1]
)  AS [Project1]
WHERE [Project1].[row_number] > 14200
ORDER BY [Project1].[Id] ASC

When Ordering on Name --> less than 1 second execution time

SELECT TOP (15) 
[Project1].[C1] AS [C1], 
[Project1].[Id] AS [Id], 
[Project1].[Name] AS [Name], 
[Project1].[SellingPrice] AS [SellingPrice], 
[Project1].[TotalIn] AS [TotalIn], 
[Project1].[TotalOut] AS [TotalOut]
FROM ( SELECT [Project1].[Id] AS [Id], [Project1].[Name] AS [Name], [Project1].[SellingPrice] AS [SellingPrice], [Project1].[TotalIn] AS [TotalIn], [Project1].[TotalOut] AS [TotalOut], [Project1].[C1] AS [C1], row_number() OVER (ORDER BY [Project1].[Name] ASC) AS [row_number]
    FROM ( SELECT 
        [Extent1].[Id] AS [Id], 
        [Extent1].[Name] AS [Name], 
        [Extent1].[SellingPrice] AS [SellingPrice], 
        [Extent1].[TotalIn] AS [TotalIn], 
        [Extent1].[TotalOut] AS [TotalOut], 
        1 AS [C1]
        FROM [dbo].[GetProductStatistics](NULL, NULL) AS [Extent1]
    )  AS [Project1]
)  AS [Project1]
WHERE [Project1].[row_number] > 14200
ORDER BY [Project1].[Name] ASC

Can anybody think of some reason why the first query is so much slower? How can I fix this strange behaviour?


Edit:

This is the function I created

CREATE FUNCTION GetProductStatistics 
(   
    @StartDate DateTime = null,
    @EndDate DateTime = null
)
RETURNS TABLE 
AS
RETURN 
(
    SELECT        
        p.Id, 
        p.Name,
        p.ExternalId,
        p.Manufacturer,
        p.SellingPrice,
        SUM(CASE WHEN h.Increase = 1 AND h.Type IN ('Create', 'Refill') AND (@StartDate IS NULL OR @EndDate IS NULL OR (h.UpdatedOn BETWEEN @StartDate AND @EndDate)) THEN h.Amount ELSE 0 END) * p.SellingPrice AS TotalIn, 
        SUM(CASE WHEN h.Increase = 0 AND h.Type IN ('Used') AND (@StartDate IS NULL OR @EndDate IS NULL OR (h.UpdatedOn BETWEEN @StartDate AND @EndDate)) THEN h.Amount ELSE 0 END) * p.SellingPrice AS TotalOut
    FROM
        dbo.Products AS p 
        LEFT OUTER JOIN dbo.Storage AS s ON p.Id = s.ProductId 
        LEFT OUTER JOIN dbo.StorageHistory AS h ON s.Id = h.StorageId
    WHERE p.IsArchived = 0
    GROUP BY 
        p.Id, 
        p.Name,
        p.ExternalId,
        p.Manufacturer,
        p.SellingPrice
)
GO

Edit2:

Here is a picture of the 2 execution plans next to each other. @Yosi they are totally different…

(Click for larger view)
Picture

When I change the Order By on the 8th line of the query generated by LINQ To EF, the query runs in less than 1 second.

FROM ( SELECT [Project1].[Id] AS [Id], [Project1].[Name] AS [Name], [Project1].[SellingPrice] AS [SellingPrice], [Project1].[TotalIn] AS [TotalIn], [Project1].[TotalOut] AS [TotalOut], [Project1].[C1] AS [C1], row_number() OVER (ORDER BY [Project1].[Id] ASC) AS [row_number]

TO

FROM ( SELECT [Project1].[Id] AS [Id], [Project1].[Name] AS [Name], [Project1].[SellingPrice] AS [SellingPrice], [Project1].[TotalIn] AS [TotalIn], [Project1].[TotalOut] AS [TotalOut], [Project1].[C1] AS [C1], row_number() OVER (ORDER BY [Project1].[NAME] ASC) AS [row_number]

Edit3:

@GertArnold here is a screenshot of the general setting of the Index (Product Id). On all my tables I only have the default clustered primary key index... Product ID index

Scott Chamberlain
  • 124,994
  • 33
  • 282
  • 431
Beejee
  • 1,836
  • 2
  • 17
  • 31

1 Answers1

1

A colleague of me found the solution. Because the function returns a new table and not the original product table, there is no index defined. He told me to insert the results first in a table variable where the Id is defined as primary key (indexed). The Id is a varchar in a specific pattern and therefore not easy to sort on because it isn’t really distinctive.

So I changed my custom function to the following:

CREATE FUNCTION [dbo].[GetProductStatistics] 
(   
    @StartDate DateTime = null,
    @EndDate DateTime = null
)
RETURNS @temTable TABLE (
        Id       VARCHAR(50) NOT NULL, 
        Name        VARCHAR(50) NOT NULL, 
        ExternalId    VARCHAR(50) NOT NULL, 
        Manufacturer    VARCHAR(50) NOT NULL, 
        SellingPrice  numeric(18, 2) NOT NULL, 
        TotalIn  numeric(18, 2) NOT NULL, 
        TotalOut  numeric(18, 2) NOT NULL, 
        PRIMARY KEY (Id)
    )
AS
BEGIN 
    INSERT INTO @temTable
        SELECT        
            p.Id, 
            p.Name,
            p.ExternalId,
            p.Manufacturer,
            p.SellingPrice,
            SUM(CASE WHEN h.Increase = 1 AND h.Type IN ('Create', 'Refill') AND (@StartDate IS NULL OR @EndDate IS NULL OR (h.UpdatedOn BETWEEN @StartDate AND @EndDate)) THEN h.Amount ELSE 0 END) * p.SellingPrice AS TotalIn, 
            SUM(CASE WHEN h.Increase = 0 AND h.Type IN ('Used') AND (@StartDate IS NULL OR @EndDate IS NULL OR (h.UpdatedOn BETWEEN @StartDate AND @EndDate)) THEN h.Amount ELSE 0 END) * p.SellingPrice AS TotalOut
        FROM
            dbo.Products AS p 
            LEFT OUTER JOIN dbo.Storage AS s ON p.Id = s.ProductId 
            LEFT OUTER JOIN dbo.StorageHistory AS h ON s.Id = h.StorageId
        WHERE p.IsArchived = 0
        GROUP BY 
            p.Id, 
            p.Name,
            p.ExternalId,
            p.Manufacturer,
            p.SellingPrice;
    Return;
END

I must say I was really surprised with the results. The query executes in less than a second, no matter what I sort on.

Beejee
  • 1,836
  • 2
  • 17
  • 31