COUNT_BIG in Indexed View

Question

CREATE TABLE test2 (
id INTEGER,
name VARCHAR(10),
family VARCHAR(10),
amount INTEGER)

CREATE VIEW dbo.test2_v WITH SCHEMABINDING 
AS
SELECT id, SUM(amount) as amount
-- , COUNT_BIG(*) as tmp
FROM dbo.test2 
GROUP BY id

CREATE UNIQUE CLUSTERED INDEX vIdx ON test2_v(id)

I have error with this code:

Cannot create index on view 'test.dbo.test2_v' because its select list does not include a proper use of COUNT_BIG. Consider adding COUNT_BIG(*) to select list.

I can create view like this:

CREATE VIEW dbo.test2_v WITH SCHEMABINDING 
    AS
    SELECT id, SUM(amount) as amount, COUNT_BIG(*) as tmp
    FROM dbo.test2 
    GROUP BY id

But I'm just wondering what is purpose of this column in this case?

Previous equivalent question: http://stackoverflow.com/questions/3501825/sql-server-create-view-index-which-contains-distinct-or-group-by — Tao, May 17 '11 at 11:38

score 22 · Accepted Answer · edited Apr 28 '15 at 19:57

22

You need COUNT_BIG in this case because of the fact you are using GROUP BY.

This is one of many limitations of Indexed Views and because of these restrictions, Indexed Views can't be used in many places or the usage of it is NOT as effective as it could have been. Unfortunately, it is how it works currently. Sucks, it narrows the scope of usage.

http://technet.microsoft.com/en-us/library/cc917715.aspx

edited Apr 28 '15 at 19:57

Mike G

4,232
9
40
66

answered May 17 '11 at 11:38

Sankar Reddy

1,499
9
10

15

It prevents SQL Server from having to do table scans during a delete operation. If I delete 14 rows with the same ID from a table, and that makes the SUM() expression equal 0, does that mean a) there were only 14 rows with that ID, delete it from the index, or b) There are remaining rows with this ID in the table, but their amount column SUMs to 0? If it can look at COUNT_BIG, it can answer that question immediately. – Damien_The_Unbeliever May 17 '11 at 14:02
1

Hah, I knew there was a simple brief explanation somewhere, but I couldn't find it anymore so I added all that additional, possibly-unnecessary detail on my answer below. Thanks @Damien_The_Unbeliever, yours is much much more concise and therefore probably clearer. If you were to add an actual answer I certainly would upvote it :) – Tao Aug 29 '16 at 09:58
1

(downvote on original answer - the answer simply restates the restriction and then complains a bit, without providing any explanation as to why the restriction exists or how it actually makes sense) – Tao Aug 29 '16 at 10:07

Tao · Answer 2 · 2016-08-29T09:49:29.153

Looks like it's simply a hardcoded performance-related restriction that the SQL Server team had to put in place when they first designed aggregate indexed views in SQL Server 2000.

Until relatively recently you could see this in the SQL 2000 technet documentation at http://msdn.microsoft.com/en-us/library/aa902643(SQL.80).aspx, but the SQL Server 2000 documentation has been definitely retired. You can still download a 92MB PDF file, and find the relevant notes on pages 1146 and 2190: https://www.microsoft.com/en-us/download/details.aspx?id=51958

An explanation for this restriction can be found on the SQLAuthority site - actually an excerpt from Itzik Ben-Gan's "Inside SQL" book: http://blog.sqlauthority.com/2010/09/21/sql-server-count-not-allowed-but-count_big-allowed-limitation-of-the-view-5/

It's worth noting that Oracle has the same restriction/requirement, for the same reasons (for an equivalent fast refreshable materialized view); see http://rwijk.blogspot.com.es/2009/06/fast-refreshable-materialized-view.html for a discussion on this topic.

Summary of the explanation:

Why does sql server logically need a materialized global count column in the indexed aggregate view?
- So that it can quickly check / know whether a particular row in the aggregate view needs to change or go, when a given row of an underlying table is updated or deleted.
Why does this count column need to be COUNT_BIG(*)?
- So that there is no possible risk of overflow; by forcing the use of the bigint datatype, there is no risk of an indexed view "breaking" when a particular row reaches an overly high count.

It's relatively easy to visualize why a count is critical to efficient aggregate view maintenance - imagine the following situation:

The table structures are as specified in the question

There are 4 rows in the underlying table:

ID  | name | family | amount
--- | ---- | ------ | ------
1   | a    |        | 10    
2   | b    |        | 11    
2   | c    |        | 12    
3   | d    |        | 13

The aggregate view is materialized to something like this:

ID  | amount | tmp
--- | ------ | ---
1   | 10     | 1
2   | 23     | 2
3   | 13     | 1

Simple Case:
- The SQL Engine detects a change in the underlying data - the third row in the source data (id 2, name c) is deleted.
- The engine needs to:
  - find and update the relevant row of the aggregate materialized view
  - reduce the "amount" sum by the amount of the deleted underlying row
  - reduce the "count" by 1 (if this column exists)
Target/difficult case:
- The SQL Engine detects another change in the underlying data - the second row in the source data (id 2, name b) is deleted.
- The engine needs to:
  - find and delete the relevant row of the aggregate materialized view, as there are no more source rows with the same grouping key
Consider that the engine always has the "before" row of the underlying table(s) at view-update time - it knows exactly what changed in both cases.
The notable "step" in the materialized-view-maintenance algorithm is determining whether the target materialized aggregate row needs to be deleted or not
- if you have a "count", you don't need to look anywhere beyond the target row - if you're dropping the count to 0, then delete the row. If you're updating to any other value, then leave the row.
- if you don't have a count, then the only way for you to figure it out, would be to query the underlying table to check for any other rows with the same aggregation key; such a process would clearly introduce much more onerous restrictions:
  - it would be implicitly slower, and
  - in join-aggregation cases would be un-optimizable!

For these reasons, the existence of a count(*) column is a fundamental requirement of the aggregate materialized view implementation. Without a count(*) column, the real-time maintenance of an aggregate materialized view in the face of underlying data changes would carry an unacceptably high performance penalty!

You could still ask "why doesn't SQL Server create/maintain such a count column for me automatically when I create an aggregate materialized view?" - I don't have a particularly good answer for this. In the end, I imagine there would be more questions and confusion about "Why does my aggregate materialized view have a BIGCOUNT column if I didn't add it?" if they did that, so it's simpler to make it a basic requirement of creation of the object, but that's a purely subjective opinion.

I appreciate Tao :). but i didn't find out what actually it needs count_big(*), just performance issue?! really Ms Sql can't handle by owns structures?!! — Mohammadreza, Aug 26 '16 at 14:52
@Mohammadreza I added more detail to the explanation (and a note about how the exact same requirement/restriction applies in Oracle :)); please let me know if it doesn't make sense or needs further clarification. — Tao, Aug 29 '16 at 09:50

score 1 · Answer 3 · edited Apr 14 '14 at 15:44

I know this thread is a bit old but for those who still have this question, http://technet.microsoft.com/en-us/library/ms191432%28v=sql.105%29.aspx says this about indexed views

The SELECT statement in the view cannot contain the following Transact-SQL syntax elements:

The AVG, MAX, MIN, STDEV, STDEVP, VAR, or VARP aggregate functions. If AVG(expression) is specified in queries referencing the indexed view, the optimizer can frequently calculate the needed result if the view select list contains SUM(expression) and COUNT_BIG(expression). For example, an indexed view SELECT list cannot contain the expression AVG(column1). If the view SELECT list contains the expressions SUM(column1) and COUNT_BIG(column1), SQL Server can calculate the average for a query that references the view and specifies AVG(column1).

COUNT_BIG in Indexed View

3 Answers3

Linked