Very slow SQL query

Question

I'm having a problem with a slow query. Consider the table tblVotes - and it has two columns - VoterGuid, CandidateGuid. It holds votes cast by voters to any number of candidates.

There are over 3 million rows in this table - with about 13,000 distinct voters casting votes to about 2.7 million distinct candidates. The total number of rows in the table is currently 6.5 million.

What my query is trying to achieve is getting - in the quickest and most cache-efficient way possible (we are using SQL Express) - the top 1000 candidates based on the number of votes they have received.

The code is:

SELECT CandidateGuid, COUNT(*) CountOfVotes
FROM dbo.tblVotes
GROUP BY CandidateGuid
HAVING COUNT(*) > 1
ORDER BY CountOfVotes DESC

... but this takes a scarily long time to run on SQL express when there is a very full table.

Can anybody suggest a good way to speed this up and get it running in quick time? CandidateGuid is indexed individually - and there is a composite primary key on CandidateGuid+VoterGuid.

I think you need an extra index on CountOfVotes, since your ordering is done over that and count evaluates it also. — DrCopyPaste, May 07 '13 at 14:30
CountOfVotes is calculated inside this query, it is not a permanent indexable column? — Jackfruit, May 07 '13 at 14:36
yeah, that was bad advice, but what MIGHT actually work are indexed views, that way you pay with memory for more speed here is an example question from so: http://stackoverflow.com/questions/6030143/count-big-in-indexed-view — DrCopyPaste, May 07 '13 at 14:42
Sounds interesting, but memory is at a premium here as it is SQL Express (which has a limit) .. — Jackfruit, May 07 '13 at 14:51
How long is it taking to run, and what sort of reduction in run time do you want? — , May 07 '13 at 15:03

score 1 · Accepted Answer · answered May 07 '13 at 14:43

If you have only two columns in a table, a "normal" index on those two fields won't help you much, because it is in fact a copy of your entire table, only ordered. First check in execution plan, if your index is being used at all. Then consider changing your index to clustered index.

score 0 · Answer 2 · answered May 07 '13 at 15:00

0

Try using a top n, instead of a having clause - like so:

SELECT TOP 1000 CandidateGuid, COUNT(*) CountOfVotes
FROM dbo.tblVotes
GROUP BY CandidateGuid
ORDER BY CountOfVotes DESC

answered May 07 '13 at 15:00

1

To elaborate, this way the `ORDER BY` can discard entries that aren’t in the top 1000. In fact, the original `HAVING` is superfluous. Since there is no join here, candidates with count of zero will simply not appear in the result. – Andrew Lazarus May 07 '13 at 16:06
Won't the engine still have to computer the COUNT for each GROUP? – Larry Lustig May 07 '13 at 16:36

score 0 · Answer 3 · answered May 07 '13 at 16:35

I don't know if SQL Server is able to use the composite index to speed this query, but if it is able to do so you would need to express the query as SELECT CandidateGUID, COUNT(VoterGUID) FROM . . . in order to get the optimization. This is "safe" because you know VoterGUID is never NULL, since it's part of a PRIMARY KEY.

If your composite primary key is specified as (CandidateGUID, VoterGUID) you will not get any added benefit of a separate index on just CandidateGUID -- the existing index can be used to optimize any query that the singleton index would assist in.

Very slow SQL query

3 Answers3