2

Is there a way how to get the row count for a big table (billions of rows) with a condition? Is there an alternative to the following code?

select count(*) from MyBigTable where ChosenColumn = 1

Without the condition where ChosenColumn = 1, there are several solutions listed here: Fastest way to count exact number of rows in a very large table?

and

https://learn.microsoft.com/de-de/archive/blogs/martijnh/sql-serverhow-to-quickly-retrieve-accurate-row-count-for-table

Is there a way how to apply these solutions if there is a where-condition?

O. Jones
  • 103,626
  • 17
  • 118
  • 172
Boris
  • 79
  • 5

1 Answers1

0

Most of the suggestions from the posts you've supplied speed up the row count at the expense of accuracy.

SQL Server will try to find the quickest way to do a row count. So, if it can, it will use a nonclustered index over a clustered one, resulting in fewer pages needing to be scanned.

As @nbk suggested, a nonclustered index on [ChosenColumn] (so long as it's not a filtered index) should greatly reduce the number of pages SQL Server needs to read to get an accurate result.

Of course, the usual caveats apply around balancing table reads and writes when adding/altering/dropping indexes.

MattM
  • 314
  • 2
  • 10
  • My task is to analyze a big table (250 columns, millions of rows). I need to find out, how many defined values (e.g. -1) there are in each column. I have a solution that loops through the columns of my table and uses the methods described in my question. However, I have to do: select column into #tab from MyBigTable where column = -1 And then apply the methods to #tab. Do you see any way how this can be efficiently dealt with? – Boris Oct 02 '22 at 23:27
  • @Boris, that is a different question. Ask a different question for it. – O. Jones Oct 03 '22 at 01:05
  • Not entirely sure why you'd bother selecting into a temp table before counting. If you're doing this to avoid blocking other queries, your select statement would block just as much. (Unless you're using a NOLOCK hint, in which case, you're looking at potential inaccuracies anyway.) – MattM Oct 03 '22 at 01:19
  • There are other options you can look at, depending on how heavily the table is queried, the types of queries typically executed, how frequently the data changes, and so on. I agree with @O.Jones -- ask another question around what you're trying to achieve, what you've already attempted, and the problems you've encountered so far. – MattM Oct 03 '22 at 01:28