For the following query to run quickly which columns I should add to an index?
select max(id) from tb
group by BranchId,ArticleID
having count(*)>1
For the following query to run quickly which columns I should add to an index?
select max(id) from tb
group by BranchId,ArticleID
having count(*)>1
Your current query may not benefit from any index, because of the count(*) > 1
term, which SQL Server might interpret as taking the entire count of each group. But, we can rewrite your query as follows, so that it may use an index:
SELECT MAX(id)
FROM tb
GROUP BY BranchId, ArticleID
HAVING MIN(id) <> MAX(id);
Then, add the following index:
CREATE INDEX idx ON tb (BranchId, ArticleID, Id);
The trick here is to rephrase count(*) > 1
as being logically equivalent to the smallest and largest id
value in a group not being the same. Note that I assume here that id
is a unique column, i.e. that a given group would never have two or more records with the same id
value.
Apart from what Tim has already covered on ROWSTORE indexes, I want to make a few additional comments here along with a demonstration:
As long as you are able to write a query in a way that you can make the 'searched' columns (columns used in where clause, joins, group by, order by, distinct) seargable (when query can make use of the indexed columns), ROWSTORE indexes are fine. But then there are many situations where you might have difficulty in trying to maintain the same or can have a better approach, some of they are:
Then, using COLUMNSTORE index can provide impressive performance enhancement due its massive data compression for column values arriving from a similar domain. In SQL Server 2017, batch execution can improve the performance further. Columnstore index also gives better index re-organize results.
I cited some of the above points from Brent Ozar's presentation.
Here is a demonstration where I prepared 2 exact same tables with 10 Million rows in each with certain level of cardinality.
Data Preparation:
--Test table1
drop table if exists dbo.dummy
select top 10000000 objectid1 = a.object_id, Name1 = a.name, objectid2 = b.object_id, Name2 = b.name, objectid3 = c.object_id, Name3 = c.name
into dbo.dummy
from sys.objects a
cross join sys.objects b
cross join sys.objects c
order by a.object_id, a.name
drop index if exists ix_dummy on dbo.dummy
go
--create a nonclustered rowstore index
create index ix_dummy on dbo.dummy (objectid1, objectid2, objectid3)
go
--Test Table2
drop table if exists dbo.dummy2
select top 10000000 objectid1 = a.object_id, Name1 = a.name, objectid2 = b.object_id, Name2 = b.name, objectid3 = c.object_id, Name3 = c.name
into dbo.dummy2
from sys.objects a
cross join sys.objects b
cross join sys.objects c
order by a.object_id, a.name
drop index if exists ix_dummy2 on dbo.dummy2
go
--create a nonclustered columnstore index
create nonclustered columnstore index ix_dummy2 on dbo.dummy2 (objectid1, objectid2, objectid3)
go
set statistics io on
set statistics time on
Percentage and Reads say Columnstore Index is the Winner, but Elapsed Time says Rowstore Index is the Winner
--Simple search
--Run these 2 queries together and compare their percantage of time taken with respect each other, logical read, elapsed time.
select objectid3
from dbo.dummy
where objectid1 in (5) -- look for some object_id that exists in your database
select objectid3
from dbo.dummy2
where objectid1 in (5) -- look for some object_id that exists in your database
Logical Reads and Elapsed Time:
Execution Plans:
--Agregate queries
----Run these 2 queries together and compare their percantage of time taken with respect each other, logical read, elapsed time.
select max(objectid3)
from dbo.dummy
group by objectid1, objectid2
having max(objectid3) <> min(objectid3)
select max(objectid3)
from dbo.dummy2
group by objectid1, objectid2
having max(objectid3) <> min(objectid3)
Logical Reads and Elapsed Time: