1

SQL Server does not maintain statistics for table variable and it always estimates 1 output row. Microsoft changed this behavior post release of SQL Server 2019. Though I am not able to find much information about those changes.

Below is my sample code. Note, that dbo.Integers just a local user table that contains 100k integers in incremental order.

declare @myTestTable table  (
    ID int not null,
    Symbol varchar(100) not null primary key clustered,
    Price decimal(9,2) null,
    EntityID int not null
);

insert @myTestTable (ID, Symbol, Price, EntityID)
select ID = n.Num,
    Symbol = cast(hashbytes('SHA', cast(n.Num as varchar(30))) as varchar(100)),
    Price = rand(n.Num) * 1000,
    EntityID = case when n.Num < 4000 then 1 else n.Num end
from dbo.Integers n
where n.Num < 5000;

select * from @myTestTable
where EntityID = 1  

Here is the execution plan details.

enter image description here

It is clear from the above stats, that estimated number of rows 70.70 is square root of total number of rows 4999. I changed number of rows but same calculation applied.

As I was not able to find proper documentation on it. Could you please help me understand what changes Microsoft has brought in recent upgrades related to table variable statistics and estimation and if estimated rows are always square root of total number of rows in case of table variable?

KnowledgeSeeeker
  • 620
  • 1
  • 9
  • 14
  • You are looking at *estimated number of rows per execution*, not *estimated number of rows to be read* (?) – topsail Jul 07 '23 at 18:26
  • Yes, I am looking at *estimated number of rows per execution* , when I replace table variable with temp table. It gives me correct estimated number i.e. 3999 – KnowledgeSeeeker Jul 07 '23 at 18:38
  • Did you try with a #temp table? There’s a narrow use case for table variables but usually a #temp table will provide more consistency and predictability. – Stuck at 1337 Jul 07 '23 at 19:11
  • 2
    Table variables have always had cardinality information and never had column statistics (and still don't) - the only change is that now compilation can be deferred until later on so it can take account of the cardinality information (rather than statements being compiled when the table variable cardinality was still zero as nothing had been inserted yet leading to the "one row" estimate). The cardinality information only has the number of rows in the table variable so it needs to resort to guesses as to how many of those will match `Num < 5000` rather than being able to refer to histograms – Martin Smith Jul 07 '23 at 19:29
  • 1
    Or rather how many of those will match `EntityID = 1`. On older versions of SQL Server you can do `select * from @myTestTable where EntityID = 1 OPTION (RECOMPILE)` to see the same estimated rows - the `OPTION (RECOMPILE)` means that the statement will be recompiled after the 4999 rows have been inserted to the table variable – Martin Smith Jul 07 '23 at 19:42
  • @MartinSmith So the square root of total number of rows is a fixed formula to estimate number of rows retrieved in newer versions? – KnowledgeSeeeker Jul 08 '23 at 05:06
  • @Stuckat1337 yeah, in most of the cases I usually prefer temp tables over table variables. It is just I want to know how estimation works for table variables in newer version. – KnowledgeSeeeker Jul 08 '23 at 05:08
  • @KnowledgeSeeeker - It looks reasonably fixed as all compat levels of >= 120 use this guess but the exact details of cardinality estimation are undocumented and subject to change. The legacy cardinality estimator guesses `594.514` for this same case. So they have already revised the method at least once - from `POWER(4999.0,.75)` to `POWER(4999.0,.5)` – Martin Smith Jul 08 '23 at 10:47

1 Answers1

1

Table variables have always had cardinality information and rows in tempdb.sys.partitions showing the correct row count.

The reason why often historically execution plans "estimate 1 output row" is because SQL Server compiles all statements in a batch before executing. This means that the statements selecting from the table variable are compiled before the statements inserting any rows to them are executed so table cardinality is still zero.

The new behaviour is just to defer compilation of these statements so they can be compiled just before execution (at which point an accurate row count can be known).

On previous versions it has always been possible to add an OPTION (RECOMPILE) hint to get a plan compiled that takes this row count into account too.

Table variables have never had column statistics (and still don't). All SQL Server knows is that the table contains 4999 rows - it has no histogram to refer to that tells it that 3999 of those rows have EntityID = 1 so it needs to resort to guesses as to how many of those will match the predicate.

With the new cardinality estimator it guesses 70.7036 (POWER(4999e0,.5)) and with the legacy cardinality estimator 594.514 (as heuristic used previously for this case was POWER(4999e0,.75)).

The query_optimizer_estimate_cardinality extended event shows that the stats are treated as a black box and it just has to guess.

Field Value
name query_optimizer_estimate_cardinality
input_relation <Operator Name="LogOp_Select" ClassNo="32"><StatsCollection Name="CStCollBlackBox" Id="1" Card="4999.00"/><Operator Name="ScaOp_Comp " ClassNo="100"><CompInfo CompareOp="EQ"/><Operator Name="ScaOp_Identifier " ClassNo="99"><IdentifierInfo TableName="" ColumnName="EntityID"/></Operator><Operator Name="ScaOp_Const " ClassNo="98"><ConstInfo Type="int" Value="(1)"/></Operator></Operator></Operator>
calculator <CalculatorList><FilterCalculator CalculatorName="CSelCalcPointPredsFreqBased"><SubCalculator Role="DistinctCountPlan"><DistinctCountCalculator CalculatorName="CDVCPlanLeaf" Guesses="1"/></SubCalculator></FilterCalculator></CalculatorList>
stats_collection <StatsCollection Name="CStCollFilter" Id="2" Card="70.70"/>
Martin Smith
  • 438,706
  • 87
  • 741
  • 845