I have a huge table with more than one hundred million of rows and I have to query this table to return a set of data in a minimum of time.
So I have created a test environment with this table definition:
CREATE TABLE [dbo].[Test](
[Dim1ID] [nvarchar](20) NOT NULL,
[Dim2ID] [nvarchar](20) NOT NULL,
[Dim3ID] [nvarchar](4) NOT NULL,
[Dim4ID] [smalldatetime] NOT NULL,
[Dim5ID] [nvarchar](20) NOT NULL,
[Dim6ID] [nvarchar](4) NOT NULL,
[Dim7ID] [nvarchar](4) NOT NULL,
[Dim8ID] [nvarchar](4) NOT NULL,
[Dim9ID] [nvarchar](4) NOT NULL,
[Dim10ID] [nvarchar](4) NOT NULL,
[Dim11ID] [nvarchar](20) NOT NULL,
[Value] [decimal](21, 6) NOT NULL,
CONSTRAINT [PK_Test] PRIMARY KEY CLUSTERED
(
[Dim1ID] ASC,
[Dim2ID] ASC,
[Dim3ID] ASC,
[Dim4ID] ASC,
[Dim5ID] ASC,
[Dim6ID] ASC,
[Dim7ID] ASC,
[Dim8ID] ASC,
[Dim9ID] ASC,
[Dim10ID] ASC,
[Dim11ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
This table is the fact table of Star schema architecture (fact/dimensions). As you can see I have a clustered index on all the columns except for the “Value” column.
I have filled this data with approx. 10,000,000 rows for testing purpose. The fragmentation is currently at 0.01%.
I would like to improve the performance when reading a set of rows from this table using this query:
DECLARE @Dim1ID nvarchar(20) = 'C1'
DECLARE @Dim9ID nvarchar(4) = 'VRT1'
DECLARE @Dim10ID nvarchar(4) = 'S1'
DECLARE @Dim6ID nvarchar(4) = 'FRA'
DECLARE @Dim7ID nvarchar(4) = '' -- empty = all
DECLARE @Dim8ID nvarchar(4) = '' -- empty = all
DECLARE @Dim2 TABLE ( Dim2ID nvarchar(20) NOT NULL )
INSERT INTO @Dim2 VALUES ('A1'), ('A2'), ('A3'), ('A4');
DECLARE @Dim3 TABLE ( Dim3ID nvarchar(4) NOT NULL )
INSERT INTO @Dim3 VALUES ('P1');
DECLARE @Dim4ID TABLE ( Dim4ID smalldatetime NOT NULL )
INSERT INTO @Dim4ID VALUES ('2009-01-01'), ('2009-01-02'), ('2009-01-03');
DECLARE @Dim11 TABLE ( Dim11ID nvarchar(20) NOT NULL )
INSERT INTO @Dim11 VALUES ('Var0001'), ('Var0040'), ('Var0060'), ('Var0099')
SELECT RD.Dim2ID,
RD.Dim3ID,
RD.Dim4ID,
RD.Dim5ID,
RD.Dim6ID,
RD.Dim7ID,
RD.Dim8ID,
RD.Dim9ID,
RD.Dim10ID,
RD.Dim11ID,
RD.Value
FROM dbo.Test RD
INNER JOIN @Dim2 R
ON RD.Dim2ID = R.Dim2ID
INNER JOIN @Dim3 C
ON RD.Dim3ID = C.Dim3ID
INNER JOIN @Dim4ID P
ON RD.Dim4ID = P.Dim4ID
INNER JOIN @Dim11 V
ON RD.Dim11ID = V.Dim11ID
WHERE RD.Dim1ID = @Dim1ID
AND RD.Dim9ID = @Dim9ID
AND ((@Dim6ID <> '' AND RD.Dim6ID = @Dim6ID) OR @Dim6ID = '')
AND ((@Dim7ID <> '' AND RD.Dim7ID = @Dim7ID) OR @Dim7ID = '')
AND ((@Dim8ID <>'' AND RD.Dim8ID = @Dim8ID) OR @Dim8ID = '')
I have tested this query and that’s returned 180 rows with these times: 1st execution: 1 min 32 sec; 2nd execution: 1 min.
I would like to return the data in a few seconds if it’s possible.
I think I can add the non-clustered indexes but I am not sure what the best way is to set the non-clustered indexes! If having sorted order data in this table could improve the performances? Or are there other solutions than indexes?
Thanks.