1

What I'm looking for is to return some estimate of row count, instead of the actual count which can be an expensive call. Similar to what you see in google search (... of about 1.000 rows).

Are there some out-of-the-box solutions for this? If not, what's the general approach?

I'm querying Sql Server 2008 database.

EDIT: To clarify, the result count relates to certain user queries. For example, user searches for "John" and the result should be "There are about 1.280.000 rows that match John"

veljkoz
  • 8,384
  • 8
  • 55
  • 91
  • 2
    One thing to remember is that the count operator is relatively inexpensive due to the complex tree like structuring used by SQL Server. If you are counting the rows of a single table then the tree traversal expense is relative to the number of rows but still generally very efficient. – Brian Scott Nov 02 '10 at 13:04
  • It's expensive because of the joins of the very big tables, and somewhat complex 'where' statement. – veljkoz Nov 02 '10 at 13:07
  • @veljkox: If that's the case you may wish to run the SQL Server profiler when running your queries and counts to grab performance statistics. Then you can run them through the sql server tuning advisor which will probably suggest appropriate indexes which can drastically increase performance over large joins and large sets of data. – Brian Scott Nov 02 '10 at 13:09
  • @Brian Scott - that was already done. This is the next step... – veljkoz Nov 02 '10 at 13:10
  • Ok, one last suggestion then. If you have spent the time making the performance as good as possible in the joins then perhaps you may find indexed views to be a solution. These improve performance by prejoining tables and result sets before querying. They can also be setup as snapshots that refresh over time or based on certain criteria, this can be useful for avoiding the most expensive parts of your query later on. Have a look here: http://technet.microsoft.com/en-us/library/cc917715.aspx – Brian Scott Nov 02 '10 at 13:16

6 Answers6

5

Just to add a wild card to the existing suggestions...

If your statistics are pretty up to date, one potential idea would be to analyse the estimated execution plan from your calling code (so the limitation here is this involves code outside SQL to receive & analyse the XML)

e.g.

SET SHOWPLAN_XML ON;
SELECT Something
FROM MyTable
WHERE SomeField = 'ABC123'

Then check the returned XML to pull out the 'EstimateRows' value.

AdaTheDev
  • 142,592
  • 28
  • 206
  • 200
  • that's a nice solution but is there anywhere that quantifies that the effort to caclulate 'EsimateRows' is any less than a normal 'count' execution? – Brian Scott Nov 02 '10 at 13:27
  • 1
    +1 for unorthodox idea! :) I haven't thought of that, but unfortunatelly, the estimated rows is very roughly estimated in the execution plan, making mistakes for millions of rows, which is not something we can provide as information to the user. but thanks! – veljkoz Nov 02 '10 at 13:31
  • @Brian Scott - the count is calculated based on statistics during the process of generating an execution plan so doesn't have to go off and "touch" the data - it's all done during the pre-execution stage. For scenarios where a COUNT is proving expensive, it can be faster to have the hit to calculate an execution plan rather than actually execute the full query. Give it a whirl on a sizeable table – AdaTheDev Nov 02 '10 at 13:36
  • tried it and for simpler table queries it's pretty good. Thanks for the tip. – Brian Scott Nov 02 '10 at 13:55
  • +1 for cleverness, but **be very very very careful about performance here**. Setting SHOWPLAN_XML ON can tie up a *lot* of your DB server's CPU if this is a frequently run query. – Dave Markle Nov 02 '10 at 15:11
1

It's hard to tell what you're asking. If you're talking about returning a number from a search algorithm, you could compute a hash from the inputs, and then use that hash to map against a count that you periodically maintain every so often. That might give you "about" the right results, depending on how good the hash is and how often you update your counts.

Dave Markle
  • 95,573
  • 20
  • 147
  • 170
  • Ok, this has potential, but do you have some other pointers how to do it? If we have a search string which can grow up to 200 characters, we would need to update stats for each of the combination of characters, which is far from efficient... and not to mention that we also have other criteria as well. – veljkoz Nov 02 '10 at 14:06
1

Please see my comment above. However, if you are finding that the count operation is particularly expensive there does appear to be a way to approximate the number of rows using the following:

SELECT rows FROM sysindexes WHERE id = OBJECT_ID('sometable') AND indid < 2

This was taken from an earlier post located here:

Is count(*) really expensive?

Community
  • 1
  • 1
Brian Scott
  • 9,221
  • 6
  • 47
  • 68
1

The general approach would be to take a random sample of rows to estimate how many there really are. For example if your ids were UUIDs then you could perform a filter in your select statement which will create a random sample. So you could just look at rows with an id starting with "f". Then multiply the count by 16 to get the estimation for the row count. You would need to create an index for this to be fast though.

Jon Snyder
  • 1,999
  • 15
  • 15
  • If the performance is being hampered by the multiple join across multiple large tables then the execution path may undertake nearly all of the work required before determining how to return rows beginning with 'f'. Your solution may fit some scenarios but it's best to be aware that this may still be inappropriate depending on the schema and joins. – Brian Scott Nov 02 '10 at 13:12
1

Separate to my other answer as this is a completely different answer that you can just use from within TSQL....

Another possibility would be to use the TABLESAMPLE clause to only look at a specified number (or percentage) of data pages, and then multiply that up.

e.g.

SELECT COUNT(*)
FROM MyTable TABLESAMPLE(50 PERCENT)
WHERE SomeField = 'ABC123'

Tweaking the sample size would be needed. I recommend having a full read through the BOL reference on it as it can be a very useful.

AdaTheDev
  • 142,592
  • 28
  • 206
  • 200
0

Vlejkoz, based on your further updates it appears that you are looking for a general text search algorithm rather than what I would guess is your current expensive table lookups and joins.

In SQL Server you have a full framework for exactly this, it's called Microsoft Full Text Search and provides you with additional querying capabilities. This provides you with search syntax far more like a traditional fuzzy style Google search but taylored torwards your specific database tables.

There's a lot to the topic so best that you take a look at this introductory article which seems to meet a similiar requirement to your question:

Microsoft Full Text Search article

Brian Scott
  • 9,221
  • 6
  • 47
  • 68
  • We investigated that, and found that expressions: "like '%abc[df]_e%'" aren't supported, which is what we need... – veljkoz Nov 02 '10 at 13:29