7

I have a query as follows;

 SELECT COUNT(Id) FROM Table

The table contains 33 million records - it contains a primary key on Id and no other indices.

The query takes 30 seconds.

The actual execution plan shows it uses a clustered index scan.

We have analysed the table and found it isn't fragmented using the first query shown in this link: http://sqlserverpedia.com/wiki/Index_Maintenance.

Any ideas as to why this query is so slow and how to fix it.

The Table Definition:

 CREATE TABLE [dbo].[DbConversation](
[ConversationID] [int] IDENTITY(1,1) NOT NULL,
[ConversationGroupID] [int] NOT NULL,
[InsideIP] [uniqueidentifier] NOT NULL,
[OutsideIP] [uniqueidentifier] NOT NULL,
[ServerPort] [int] NOT NULL,
[BytesOutbound] [bigint] NOT NULL,
[BytesInbound] [bigint] NOT NULL,
[ServerOutside] [bit] NOT NULL,
[LastFlowTime] [datetime] NOT NULL,
[LastClientPort] [int] NOT NULL,
[Protocol] [tinyint] NOT NULL,
[TypeOfService] [tinyint] NOT NULL,
  CONSTRAINT [PK_Conversation_1] PRIMARY KEY CLUSTERED 
 (
[ConversationID] ASC
 )WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
 ) ON [PRIMARY]
 GO

One thing I have noticed is the database is set to grow in 1Mb chunks.

It's a live system so we restricted in what we can play with - any ideas?

UPDATE:

OK - we've improved performance in the actual query of interest by adding new non-clustered indices on appropriate columns so it's not a critical issue anymore.

SELECT COUNT is still slow though - tried it with NOLOCK hints - no difference.

We're all thinking it's something to do with the Autogrowth set to 1Mb rather than a larger number, but surprised it has this effect. Can MDF fragmentation on the disk be a possible cause?

Adriano Carneiro
  • 57,693
  • 12
  • 90
  • 123
BonyT
  • 10,750
  • 5
  • 31
  • 52
  • 1
    Question 1: do you really need the exact count? Or just an estimate? – ypercubeᵀᴹ Jun 21 '11 at 13:32
  • Neither - this is just a query we ran after observing slow performance on something else. We were quite surprised to find it was so slow. Gonna try updating stats, but they're set to autoupdate. – BonyT Jun 21 '11 at 13:33
  • Can't you just use a constant? I mean, how does the difference between it having 33 million~ or it actually having 33.212.293 records impact you? – bevacqua Jun 21 '11 at 13:35
  • Its a guess, but have you tried playing with different serialization levels? -- It may be costly to do an index scan in SERIALIZABLE or SNAPSHOT. – faester Jun 21 '11 at 13:36
  • As stated - the results of this query are irrelevant - the slow database performance is however a major issue. – BonyT Jun 21 '11 at 13:36
  • 6
    @BonyT - Using a `COUNT` to diagnose slow performance isn't very illuminating. Can you open a new question with the actual issue and maybe some execution plans and/or IO statistics? – JNK Jun 21 '11 at 13:39
  • 2
    @BonyT: Exactly as @JNK advises. Post a new question with the actual slow query, the execution plan, the tables structure, indexes, etc. – ypercubeᵀᴹ Jun 21 '11 at 13:43
  • Adding NC indexes indicates a better access path onto the table that can be used by COUNT(*)... – gbn Jun 21 '11 at 14:33
  • The NC indexes didn't help with the Count – BonyT Jun 21 '11 at 14:34
  • "we've improved performance in the actual query of interest by adding new non-clustered indices on appropriate columns so it's not a critical issue anymore." -> why ask about COUNT? – gbn Jun 21 '11 at 14:49
  • It was our starter for investigating where the issue was. When this was slow we were surprised. Will defrag overnight and report back tomorrow on what the results are. – BonyT Jun 21 '11 at 14:55

4 Answers4

5

Is this a frequently read/inserted/updated table? Is there update/insert activity concurrent with your select?

My guess is the delay is due to contention.

I'm able to run a count on 189m rows in 17 seconds on my dev server, but there's nothing else hitting that table.

If you aren't too worried about contention or absolute accuracy you can do:

exec sp_spaceused 'MyTableName' which will give a count based on meta-data.

If you want a more exact count but don't necessarily care if it reflect concurrent DELETE or INSERT activity you can do your current query with a NOLOCK hint:

SELECT COUNT(id) FROM MyTable WITH (NOLOCK) which will not get row-level locks for your query and should run faster.

JNK
  • 63,321
  • 15
  • 122
  • 138
  • Not a very busy database - contention not an issue (statement nearly always takes exact same time to respond 30s and using NOLOCK hint makes no difference.) – BonyT Jun 21 '11 at 14:26
  • @BonyT - Then you have some other issues. How wide is this table? This isn't directly relevant to the `COUNT` but it does affect how many pages SQL needs to pull to get the count, which can matter at higher volumes. What other perf issues are you having? – JNK Jun 21 '11 at 14:29
  • Accepted, although may never know what the actual problem was/is here as on someone elses turf and they have put in place a workaround I understand. – BonyT Jul 12 '11 at 12:57
2

Thoughts:

  • Use SELECT COUNT(*) which is correct for "how many rows" (as per ANSI SQL). Even if ID is the PK and thus not nullable, SQL Server will count ID. Not rows.

  • If you can live with approximate counts, then use sys.dm_db_partition_stats. See my answer here: Fastest way to count exact number of rows in a very large table?

  • If you can live with dirty reads use WITH (NOLOCK)

Community
  • 1
  • 1
gbn
  • 422,506
  • 82
  • 585
  • 676
  • It really doesn't matter as the optimiser effectively uses the same plan whatever you put inside the brackets. Some people are fussy about using "*" though. Whatever this is irrelevant to the issue. – BonyT Jun 21 '11 at 14:25
  • @BonyT: If you think it's "fussy" I suggest you verify I *am aware* of the [differences](http://stackoverflow.com/questions/1221559/count-vs-count1/1221649#1221649) but count(col) may not give the same plan as count(*). Hence my suggestion – gbn Jun 21 '11 at 14:29
  • Yes we have - we actually started with * :) – BonyT Jun 21 '11 at 14:32
  • No - not you fussy - I have had people suggest I don't use "*" before - somewhere someone started a myth that using * is bad. I adjusted my question to not use * to avoid that issue lol – BonyT Jun 21 '11 at 14:33
  • @BonyT It will use the same plan but will get different results. Aggregation functions in SQL do not include nulls (this is true for COUNT as well) – Mike M. Jun 21 '11 at 14:33
  • 1
    It's a primary key - it's not going to be null! – BonyT Jun 21 '11 at 14:34
  • @BonyT: Well, my linked answer addresses this myth :-) @Mike M: ID is the PK (so NOT NULL). @Both: COUNT(*) is in the ANSI standard. http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt Look for `Scalar expressions 125` and the line just above. It may or may not be optimised of course – gbn Jun 21 '11 at 14:49
1
use [DatabaseName]

select tbl.name, dd.rows from sysindexes dd
inner join sysobjects tbl on dd.id = tbl.id where dd.indid < 2 and tbl.xtype = 'U'

select sum(dd.rows)from sysindexes dd
inner join sysobjects tbl on dd.id = tbl.id where dd.indid < 2 and tbl.xtype = 'U' 

By using these queries you can fetch all tables' count within 0-5 seconds

use where clause according to your requirement.....

Gaurav Agrawal
  • 4,355
  • 10
  • 42
  • 61
0

Another idea: When the files grow with 1MB parts, it may be fragmented on the file system. You can't see this by SQL, you see it using a disk defragmentation tool.

Stefan Steinegger
  • 63,782
  • 15
  • 129
  • 193