50

Counting tables with large amount of data may be very slow, sometimes it takes minutes; it also may generate deadlock on a busy server. I want to display real values, NOLOCK is not an option.

The servers I use is SQL Server 2005 or 2008 Standard or Enterprise - if it matters. I can imagine that SQL Server maintains the counts for every table and if there is no WHERE clause I could get that number pretty quickly, right?

For example:

SELECT COUNT(*) FROM myTable

should immediately return with the correct value. Do I need to rely on statistics to be updated?

Adi
  • 5,113
  • 6
  • 46
  • 59
  • 2
    Get a [query execution plan](http://stackoverflow.com/questions/7359702/how-do-i-obtain-a-query-execution-plan), then we can talk... (`SELECT COUNT` directly queries the table rather than using statistics, as statistics may be out of date) – Justin Sep 18 '12 at 15:05
  • 2
    Silly question, but do you have an index? – Kermit Sep 18 '12 at 15:10
  • @FreshPrinceOfSO it is still slow if you count on `Id` (bigint, primary key, identity specification=true). – ANeves Nov 07 '13 at 16:50
  • @FreshPrinceOfSO would it make any diff? yes, I have a clustered index actually. – Adi Nov 12 '13 at 21:44
  • If count is frequently used then we can store it in another table. I was also facing the same problem and I have used IF EXISTS to fix this. – Banketeshvar Narayan Feb 11 '14 at 15:50
  • I have a table with 1 billion records. To do a count on it with a date range was taking me 20 minutes to pull a count of 85,000 records in a date range with a specific column value. I guess this is normal given the size. – Michael Fever Feb 27 '19 at 22:42

5 Answers5

84

Very close approximate (ignoring any in-flight transactions) would be:

SELECT SUM(p.rows) FROM sys.partitions AS p
  INNER JOIN sys.tables AS t
  ON p.[object_id] = t.[object_id]
  INNER JOIN sys.schemas AS s
  ON s.[schema_id] = t.[schema_id]
  WHERE t.name = N'myTable'
  AND s.name = N'dbo'
  AND p.index_id IN (0,1);

This will return much, much quicker than COUNT(*), and if your table is changing quickly enough, it's not really any less accurate - if your table has changed between when you started your COUNT (and locks were taken) and when it was returned (when locks were released and all the waiting write transactions were now allowed to write to the table), is it that much more valuable? I don't think so.

If you have some subset of the table you want to count (say, WHERE some_column IS NULL), you could create a filtered index on that column, and structure the where clause one way or the other, depending on whether it was the exception or the rule (so create the filtered index on the smaller set). So one of these two indexes:

CREATE INDEX IAmTheException ON dbo.table(some_column)
  WHERE some_column IS NULL;

CREATE INDEX IAmTheRule ON dbo.table(some_column)
  WHERE some_column IS NOT NULL;

Then you could get the count in a similar way using:

SELECT SUM(p.rows) FROM sys.partitions AS p
  INNER JOIN sys.tables AS t
  ON p.[object_id] = t.[object_id]
  INNER JOIN sys.schemas AS s
  ON s.[schema_id] = t.[schema_id]
  INNER JOIN sys.indexes AS i
  ON p.index_id = i.index_id
  WHERE t.name = N'myTable'
  AND s.name = N'dbo'
  AND i.name = N'IAmTheException' -- or N'IAmTheRule'
  AND p.index_id IN (0,1);

And if you want to know the opposite, you just subtract from the first query above.

Aaron Bertrand
  • 272,866
  • 37
  • 466
  • 490
  • Also can be used when no index is present. – Kermit Sep 18 '12 at 15:27
  • your solution looked very promising but after testing on about 5 millions records, I got the same response times. will test on larger database soon. – Adi Sep 18 '12 at 21:01
  • 2
    @Adi A query against sys.partitions took a long time? I find that very hard to believe. – Aaron Bertrand Sep 18 '12 at 21:02
  • @Aaron Bertrand sorry, I wanted to say that on my current database the two statements took the same time (under 1 second) so my 5 millions records test is irrelevant; will do the test on larger db. – Adi Sep 19 '12 at 07:03
  • And make sure you perform the test on a busy DB where the table is actively being accessed / written to. An idle DB is not a very good test. – Aaron Bertrand Sep 19 '12 at 11:04
  • @Aaron Bertrand can this method be adapted to more complex queries like selecting using a WHERE clause or joining multiple tables? – Adi Sep 28 '12 at 08:17
  • 1
    @Adi no, this is for independent, whole tables. If you have a filtered index that matches your WHERE clause, you could check that index_id specifically. For a JOIN I think the only way to bypass a count would e to have an indexed view... – Aaron Bertrand Sep 28 '12 at 09:49
  • related http://blogs.msdn.com/b/martijnh/archive/2010/07/15/sql-server-how-to-quickly-retrieve-accurate-row-count-for-table.aspx dm_db_partition_stats works on azure – Imran Qadir Baksh - Baloch May 09 '15 at 07:51
  • Man, this works at speed of light. Blazingly fast! **I had `178090131` records in my table**. `Select count(1)` was taking more than a minute causing time outs on C# side with default time out value of entity framework. **Now it returns in zero seconds literally**. Though my table isn't big in terms of number of columns. I have only two columns `Id` and `Name` with clustered index on `Id` but I believe the number of columns shouldn't matter as far as this solution is concerned. – RBT Aug 20 '16 at 02:02
  • 4
    The query in this answer does appear to work on Azure Sql at this point in time. I'm testing in a P2 db on a table with 200M+ rows and about 5K new rows inserted in bulk each minute. The query in this answer returns near instantly, a naive SELECT COUNT(*) takes about a minute (it uses an index scan). Both return the same number. – quentin-starin Jul 08 '17 at 18:37
  • A table I have in SQL Azure db with some 42 million columns, I gave up on a query with the plain SELECT COUNT(*)... way after 11+ minutes of waiting. This solution was near instantaneous, < 1 second. Crazy that that horrendous looking query does the job but it did :0) – Nicholas Petersen May 11 '18 at 21:25
  • If you have a lot of tables in the DB as well (10s of thousands) this will be quite slow too. This query seems to perform better: SELECT SUM (partitions.rows) FROM sys.partitions partitions WHERE partitions.object_id = object_id('Lombiq_Hosting_Tenants') AND partitions.index_id < 2 – Piedone Aug 11 '18 at 20:57
  • @Piedone You can run my query under read uncommitted if you have a ridiculous number of objects. You can’t do that with `OBJECT_ID()`. See https://blogs.sentryone.com/aaronbertrand/bad-habits-metadata-helper-functions/ – Aaron Bertrand Aug 12 '18 at 00:41
  • I don't think I understand why read uncommited would help as I tested this on a DB without load (but with 10s of thousands of tables). – Piedone Aug 12 '18 at 13:33
  • @Piedone So did you do any investigation whatsoever into what exactly was taking time? Or did you just run it once and assume that it must be the code and couldn't be any other influence like blocking? – Aaron Bertrand Aug 12 '18 at 13:43
  • I took minimal time on this since I don't need anything more in-depth, hence a quick tip to try for anybody else who might bump into the same thing. As I mentioned I've run it on a DB without load, as in no other connection being open to the DB apart from the one running this single query - I don't know how blocking would happen here. – Piedone Aug 12 '18 at 17:41
  • @Piedone Hundreds of ways. Any Sch-M lock on any index, table, schema, etc. This can be from having a specific node open in Object Explorer, using any designer, IntelliSense, etc. etc. Blaming the join because it was slower than a different query you ran at a different time, without doing any investigation into *why* one was slower than the other, is unfair to anyone, yourself included. This is how misconceptions and myths are formed. I ran these against a 100,000-table database, repeatedly, under no load, and observed no perf diff (these are milliseconds): https://i.stack.imgur.com/MgxwC.png – Aaron Bertrand Aug 12 '18 at 18:40
  • Look I'm not "blaming" the join, nor am I making an elaborate statement arguing for the other query. I have no interest in investigating why the difference is there because I don't need that information and I have other obligations. It's just that I needed a particular thing, came here, found this didn't work well for me, but found something else that did and wanted to share in case it's useful for others. – Piedone Aug 12 '18 at 20:02
  • @AaronBertrand It is fast but how can I exclude soft-deleted data such as where deleted_at IS NULL – Saroj Shrestha Nov 26 '18 at 05:53
  • 1
    @SarojShrestha Updated the answer to address that case. – Aaron Bertrand Feb 20 '19 at 14:58
  • Minor correction: I believe that the line `AND s.name = N'IAmTheException' -- or N'IAmTheRule'` should read `AND i.name = ...` – pcdev May 16 '19 at 05:36
12

(How large is "large amount of data"? - should have commented this first, but maybe the exec below helps you out already)

If I run a query on a static (means no one else is annoying with read/write/updates in quite a while so contention is not an issue) table with 200 million rows and COUNT(*) in 15 seconds on my dev machine (oracle). Considering the pure amount of data, this is still quite fast (at least to me)

As you said NOLOCK is not an option, you could consider

exec sp_spaceused 'myTable'

as well.

But this pins down nearly to the same as NOLOCK (ignoring contention + delete/update afaik)

Najzero
  • 3,164
  • 18
  • 18
6

I've been working with SSMS for well over a decade and only in the past year found out that it can give you this information quickly and easily, thanks to this answer.

  1. Select the "Tables" folder from the database tree (Object Explorer)
  2. Press F7 or select View > Object Explorer Details to open Object Explorer Details view
  3. In this view you can right-click on the column header to select the columns you want to see including table space used, index space used and row count: enter image description here

Note that the support for this in Azure SQL databases seems a bit spotty at best - my guess is that the queries from SSMS are timing out, so it only returns a handful of tables each refresh, however the highlighted one always seems to be returned.

pcdev
  • 2,852
  • 2
  • 23
  • 39
2

Count will do either a table scan or an index scan. So for a high number of rows it will be slow. If you do this operation frequently, the best way is to keep the count record in another table.

If however you do not want to do that, you can create a dummy index (that will not be used by your query's) and query it's number of items, something like:

select 
    row_count
from sys.dm_db_partition_stats as p
inner join sys.indexes as i 
  on p.index_id = i.index_id
  and p.object_id = i.object_id
where   i.name = 'your index'

I am suggesting creating a new index, because this one (if it will not be used) will not get locked during other operations.

As Aaron Bertrand said, maintaining the query might be more costly then using an already existing one. So the choice is yours.

Dumitrescu Bogdan
  • 7,127
  • 2
  • 23
  • 31
  • But even if this index is not used for other *read* operations, it still has to be maintained for other DML. I think this dummy index is costlier than you think. – Aaron Bertrand Sep 18 '12 at 15:29
  • It might be as you say. It has to be tested. The sql can be used without creating actually a new index, but on an existing index. I used something similar on filtered indexes. I never needed to actually count the table from heads to finish. – Dumitrescu Bogdan Sep 18 '12 at 15:33
-5

If you just need a rough count of number of rows, ie. to make sure a table loaded properly or to make sure the data was not deleted, do the following:

MySQL> connect information_schema;
MySQL> select table_name,table_rows from tables;
kwatts
  • 127
  • 1
  • 7