Looks like it's simply a hardcoded performance-related restriction that the SQL Server team had to put in place when they first designed aggregate indexed views in SQL Server 2000.
Until relatively recently you could see this in the SQL 2000 technet documentation at http://msdn.microsoft.com/en-us/library/aa902643(SQL.80).aspx, but the SQL Server 2000 documentation has been definitely retired. You can still download a 92MB PDF file, and find the relevant notes on pages 1146 and 2190: https://www.microsoft.com/en-us/download/details.aspx?id=51958
An explanation for this restriction can be found on the SQLAuthority site - actually an excerpt from Itzik Ben-Gan's "Inside SQL" book: http://blog.sqlauthority.com/2010/09/21/sql-server-count-not-allowed-but-count_big-allowed-limitation-of-the-view-5/
It's worth noting that Oracle has the same restriction/requirement, for the same reasons (for an equivalent fast refreshable materialized view); see http://rwijk.blogspot.com.es/2009/06/fast-refreshable-materialized-view.html for a discussion on this topic.
Summary of the explanation:
- Why does sql server logically need a materialized global count column in the indexed aggregate view?
- So that it can quickly check / know whether a particular row in the aggregate view needs to change or go, when a given row of an underlying table is updated or deleted.
- Why does this count column need to be
COUNT_BIG(*)
?
- So that there is no possible risk of overflow; by forcing the use of the bigint datatype, there is no risk of an indexed view "breaking" when a particular row reaches an overly high count.
It's relatively easy to visualize why a count is critical to efficient aggregate view maintenance - imagine the following situation:
- The table structures are as specified in the question
There are 4 rows in the underlying table:
ID | name | family | amount
--- | ---- | ------ | ------
1 | a | | 10
2 | b | | 11
2 | c | | 12
3 | d | | 13
The aggregate view is materialized to something like this:
ID | amount | tmp
--- | ------ | ---
1 | 10 | 1
2 | 23 | 2
3 | 13 | 1
- Simple Case:
- The SQL Engine detects a change in the underlying data - the third row in the source data (id 2, name c) is deleted.
- The engine needs to:
- find and update the relevant row of the aggregate materialized view
- reduce the "amount" sum by the amount of the deleted underlying row
- reduce the "count" by 1 (if this column exists)
- Target/difficult case:
- The SQL Engine detects another change in the underlying data - the second row in the source data (id 2, name b) is deleted.
- The engine needs to:
- find and delete the relevant row of the aggregate materialized view, as there are no more source rows with the same grouping key
- Consider that the engine always has the "before" row of the underlying table(s) at view-update time - it knows exactly what changed in both cases.
- The notable "step" in the materialized-view-maintenance algorithm is determining whether the target materialized aggregate row needs to be deleted or not
- if you have a "count", you don't need to look anywhere beyond the target row - if you're dropping the count to 0, then delete the row. If you're updating to any other value, then leave the row.
- if you don't have a count, then the only way for you to figure it out, would be to query the underlying table to check for any other rows with the same aggregation key; such a process would clearly introduce much more onerous restrictions:
- it would be implicitly slower, and
- in join-aggregation cases would be un-optimizable!
For these reasons, the existence of a count(*) column is a fundamental requirement of the aggregate materialized view implementation. Without a count(*) column, the real-time maintenance of an aggregate materialized view in the face of underlying data changes would carry an unacceptably high performance penalty!
You could still ask "why doesn't SQL Server create/maintain such a count column for me automatically when I create an aggregate materialized view?" - I don't have a particularly good answer for this. In the end, I imagine there would be more questions and confusion about "Why does my aggregate materialized view have a BIGCOUNT column if I didn't add it?" if they did that, so it's simpler to make it a basic requirement of creation of the object, but that's a purely subjective opinion.