This is basically the same question asked in (Solution for speeding up a slow SELECT DISTINCT query in Postgres )
It is pretty the same situation, a huge deployed database and it is not possible to normalise it due to old legacy applications. New rows are constantly added and old ones are removed fulfiling certain criterias. I have tried the suggestion by using CTE and I see no performance gain here, pretty the same execution time as for the original
select distinct [somecolumn]
from bigtable
Same applies for the suggestion to use Group by
.
What seems to work best is the suggestion to create a view and query the view instead. (the cache has been reseted between queries)
I need some advice here on this because I do not really understand why this results in better performance.
create view [dbo].[vwDistinct]
with schemabinding
as
select
[somecolumn], count_big(*) as TableCount
from
dbo.BigTable
where
somecolumn IS NOT NULL
group by
somecolumn;
select distinct somecolumn
from vwDistinct
The application uses a stored procedure for the call. The database is on SQL Server 2008 R2 but if there is good reason, it can be moved to SQL Server 2014.
Thank you
This is the execution plan for select somecolumn from vwDistinct
<?xml version="1.0" encoding="utf-16"?>
<ShowPlanXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" Version="1.2" Build="12.0.2000.8" xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan">
<BatchSequence>
<Batch>
<Statements>
<StmtSimple StatementCompId="1" StatementEstRows="41138.3" StatementId="1" StatementOptmLevel="FULL" CardinalityEstimationModelVersion="70" StatementSubTreeCost="5.10782" StatementText="select somecolumn from vwDistinct" StatementType="SELECT" QueryHash="0x23700E4CF62A8E4E" QueryPlanHash="0x79D8240601D270CB" RetrievedFromCache="true">
<StatementSetOptions ANSI_NULLS="true" ANSI_PADDING="true" ANSI_WARNINGS="true" ARITHABORT="true" CONCAT_NULL_YIELDS_NULL="true" NUMERIC_ROUNDABORT="false" QUOTED_IDENTIFIER="true" />
<QueryPlan NonParallelPlanReason="EstimatedDOPIsOne" CachedPlanSize="16" CompileTime="59" CompileCPU="18" CompileMemory="336">
<MemoryGrantInfo SerialRequiredMemory="0" SerialDesiredMemory="0" />
<OptimizerHardwareDependentProperties EstimatedAvailableMemoryGrant="1239807" EstimatedPagesCached="77487" EstimatedAvailableDegreeOfParallelism="1" />
<RelOp AvgRowSize="37" EstimateCPU="1.03877" EstimateIO="4.06905" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="944197" LogicalOp="Clustered Index Scan" NodeId="1" Parallel="false" PhysicalOp="Clustered Index Scan" EstimatedTotalSubtreeCost="5.10782" TableCardinality="944197">
<OutputList>
<ColumnReference Database="[BigData]" Schema="[dbo]" Table="[vwDistinct]" Column="somecolumn" />
</OutputList>
<IndexScan Ordered="false" ForcedIndex="false" ForceScan="false" NoExpandHint="true" Storage="RowStore">
<DefinedValues>
<DefinedValue>
<ColumnReference Database="[BigData]" Schema="[dbo]" Table="[vwDistinct]" Column="somecolumn" />
</DefinedValue>
</DefinedValues>
<Object Database="[BigData]" Schema="[dbo]" Table="[vwDistinct]" Index="[cdxDistinct]" IndexKind="ViewClustered" Storage="RowStore" />
<IndexedViewInfo>
<Object Database="[BigData]" Schema="[dbo]" Table="[BigTable]" />
</IndexedViewInfo>
</IndexScan>
</RelOp>
</QueryPlan>
</StmtSimple>
</Statements>
</Batch>
</BatchSequence>
</ShowPlanXML>