Problem outline
I've got a performance problem with a view that is to be expanded to include a value associated to a 'smallest/greatest' value. Doing so more than halves the performance even with my smaller test dataset. The real database/query contains many more tables and joins, but these are all either fairly small tables, or simple FK joins. I've reduced the problem down to what I think is a minimal example, but may have inadvertently made more solutions possible that aren't practical with the real case. Please assume in any solution there are an additional ~10-20 ish tables and joins involved, though all are of the simple kind; e.g. lookup tables for various values, like the names of vehicles and such (the real db is more normalized). For example the 'City' value is not actually in Deliveries, but two JOINs removed, but those JOINS are 1:1, they can only ever return a single-row per source row. So any solution that completely upends the underlying query and thus makes it much more difficult to expand it with a simple join is not a great solution. Assume the following (example!) database tables. All number tables are INTs, all 'ID' columns have a PK, all 'fkX' are a foreign key relation to the named table.
Example Data
Table: Deliveries
+----+------------+---------+--------+
| ID | City | fkRoute | Number |
+----+------------+---------+--------+
| 1 | New York | 1 | 1 |
| 2 | Boston | 1 | 2 |
| 3 | Baltimore | 2 | 1 |
| 4 | Baltimore | 2 | 1 |
| 5 | Washington | 2 | 2 |
| 6 | Washington | 3 | 1 |
| 7 | Newark | 3 | 1 |
| 8 | Portland | 3 | 2 |
| 9 | Yonkers | 3 | 3 |
+----+------------+---------+--------+
Table: [Routes]
+----+------------+---------+
| ID | Date | Vehicle |
+----+------------+---------+
| 1 | 2021-07-07 | Truck A |
| 2 | 2021-07-07 | Lorry B |
| 3 | 2021-07-08 | Truck A |
+----+------------+---------+
Table: Products
+----+------------+-------+
| ID | fkDelivery | Name |
+----+------------+-------+
| 1 | 1 | W597 |
| 2 | 2 | W597 |
| 3 | 3 | T224 |
| 4 | 4 | DW F5 |
| 5 | 4 | W587 |
| 6 | 5 | W597 |
| 7 | 5 | W613 |
| 8 | 6 | DW F8 |
| 9 | 7 | T224 |
| 10 | 8 | W597 |
| 11 | 8 | DW-F8 |
| 12 | 8 | AS71 |
| 13 | 9 | T224 |
+----+------------+-------+
The problematic Query
Next, I have this current view (made using just some simple joins, a GROUP BY on [Routes].ID, and some aggregate 'COUNT()' to get the numbers). Such a query would look like:
SELECT MIN([Routes].Date) AS Date, MIN([Routes].Vehicle) AS Vehicle,
COUNT(Products.Name) AS nP, COUNT(DISTINCT Deliveries.ID) AS nD
FROM [Routes]
INNER JOIN Deliveries ON Deliveries.fkRoute = [Routes].ID
INNER JOIN Products ON Products.fkDelivery = Deliveries.ID
GROUP BY [Routes].ID
It measures number of stuff delivered, and number of locations to deliver to.
Current View:
+------------+---------+----+----+
| Date | Vehicle | nP | nD |
+------------+---------+----+----+
| 2021-07-07 | Truck A | 2 | 2 |
| 2021-07-07 | Lorry B | 5 | 3 |
| 2021-07-08 | Truck A | 6 | 4 |
+------------+---------+----+----+
A Typical Query that uses this view is:
SELECT * FROM routeView WHERE '2021-07-07 00:00:00' <= [Date]
AND '2021-07-08 23:59:59' >= [Date]
Now this view is to be expanded. With a column that is proving to be a nasty thorn to program efficiently.
New View:
+------------+---------+----+----+-----------+
| Date | Vehicle | nP | nD | FirstCity |
+------------+---------+----+----+-----------+
| 2021-07-07 | Truck A | 2 | 2 | New York |
| 2021-07-07 | Lorry B | 5 | 3 | Baltimore |
| 2021-07-08 | Truck A | 6 | 4 | Newark |
+------------+---------+----+----+-----------+
FirstCity is the City corresponding to the 'lowest' Number. This can be ambiguous (users don't number their [Routes] consistently) by multiple cities having the lowest number; in that case it should just pick the first row. Numbers may include zero, but not negative values. My initial idea is this:
SELECT MIN([Routes].Date) AS Date, MIN([Routes].Vehicle) AS Vehicle,
COUNT(Products.Name) AS nP, COUNT(DISTINCT Deliveries.ID) AS nD,
MIN(CASE WHEN Deliveries.Number = minTable.minDN THEN Deliveries.City END) AS FirstCity
FROM [Routes]
INNER JOIN Deliveries ON Deliveries.fkRoute = [Routes].ID
INNER JOIN Products ON Products.fkDelivery = Deliveries.ID
LEFT JOIN (SELECT MIN(D2.Number) AS minDN, fkRoute
FROM Deliveries AS D2 GROUP BY fkRoute) AS minTable ON [Routes].ID = minTable.fkRoute
GROUP BY [Routes].ID
This query returns the view above. It makes the arbitrary choice to return the 'alphabetically' first city when there's ambiguous data (there's no requirement there). It works... but not very fast. With the example data it's no problem, but with a real data set (of say some millions of rows), the main problem is with the sub-query. It scans/fetches the whole 'Deliveries' table. But it really only needs to look at the deliveries between 07-07 and 07-08 (with the example query), so it's doing lots of redundant work. I've been trying to adapt some of the solutions in 'SQL antipatterns', ch. 15, but without much success; I either get syntax errors for some of them, or they mess up the 'nP'/'nD' data as the number of rows returned from the primary 'base' query is changed through WHERE conditions.
What would be a good way to achieve the desired speedup?
Note: If it were just a query then I could add the 'date' condition to both the sub-query and the primary query as a trivial solution. In that case the index is used on both queries. Yet this method doesn't seem to work for a view.
To reproduce:
Full T-SQL code to re-create the test database below:
USE [testDB]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[[Routes]](
[ID] [int] IDENTITY(1,1) NOT NULL,
[Date] [date] NOT NULL,
[Vehicle] [nvarchar](50) NOT NULL,
CONSTRAINT [PK_[Routes]] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[Deliveries](
[ID] [int] IDENTITY(1,1) NOT NULL,
[City] [nvarchar](50) NOT NULL,
[fkRoute] [int] NOT NULL,
[Number] [int] NOT NULL,
CONSTRAINT [PK_Deliveries] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[Deliveries] WITH CHECK ADD CONSTRAINT [FK_Deliveries_[Routes]] FOREIGN KEY([fkRoute])
REFERENCES [dbo].[[Routes]] ([ID])
ON UPDATE CASCADE
ON DELETE CASCADE
GO
ALTER TABLE [dbo].[Deliveries] CHECK CONSTRAINT [FK_Deliveries_[Routes]]
GO
CREATE TABLE [dbo].[Products](
[ID] [int] IDENTITY(1,1) NOT NULL,
[fkDelivery] [int] NOT NULL,
[Name] [nvarchar](50) NOT NULL,
CONSTRAINT [PK_Products] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[Products] WITH CHECK ADD CONSTRAINT [FK_Products_Deliveries] FOREIGN KEY([fkDelivery])
REFERENCES [dbo].[Deliveries] ([ID])
ON UPDATE CASCADE
ON DELETE CASCADE
GO
ALTER TABLE [dbo].[Products] CHECK CONSTRAINT [FK_Products_Deliveries]
GO
INSERT INTO [[Routes]] (Date, Vehicle) VALUES ('2021-07-07', 'Truck A'), ('2021-07-07', 'Lorry B'), ('2021-07-08', 'Truck A')
GO
INSERT INTO Deliveries (City, fkRoute, Number) VALUES
('New York', 1, 1),
('Boston', 1, 2),
('Baltimore', 2, 1),
('Baltimore', 2, 1),
('Washington', 2, 2),
('Washington', 3, 1),
('Newark', 3, 1),
('Portland', 3, 2),
('Yonkers', 3, 3)
GO
INSERT INTO [Products] (fkDelivery, Name) VALUES
('1', 'W597'),
('2', 'W597'),
('3', 'T224'),
('4', 'DW F5'),
('4', 'W587'),
('5', 'W597'),
('5', 'W613'),
('6', 'DW F8'),
('7', 'T224'),
('8', 'W597'),
('8', 'DW-F8'),
('8', 'AS71'),
('9', 'T224')
GO
CREATE NONCLUSTERED INDEX [inRoutes_Date] ON [dbo].[Routes]
(
[Date] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO