How to improve the performance of my query

Question

The query below to get the distinct zip codes from the Address table takes roughly 4 mins and 42 seconds. There are 1,006,699 records in the Address table. The composite key for the table is Address1, Address2, City, ZipCode.

There have been times when the query takes 5 seconds to run or even 1 mill seconds.

How do I improve the performance of the query?

Here is the SQL query:

 SELECT DISTINCT ZipCode FROM Address

Here is the schema for the table:

CREATE TABLE [dbo].[Address]
(
    [AddressID] [INT] IDENTITY(1,1) NOT NULL,
    [Address1] [NVARCHAR](1000) NOT NULL,
    [Address2] [NVARCHAR](1000) NOT NULL,
    [City] [NVARCHAR](1000) NOT NULL,
    [StateCd] [NVARCHAR](2) NULL,
    [ZipCode] [NVARCHAR](10) NOT NULL,

    PRIMARY KEY CLUSTERED 
    ([Address1] ASC, [Address2] ASC, [City] ASC, [ZipCode] ASC)
) ON [PRIMARY]
GO

ALTER TABLE [dbo].[Address] ADD DEFAULT ('') FOR [Address2]
GO

I can't seem to add an image of the execution plan.

You you must use the exact same columns for primary key you could move ZIP-Code first. — Salman A, Jan 18 '19 at 13:13

Gordon Linoff · Accepted Answer · 2019-01-18T12:41:24.527

2

For this query:

SELECT DISTINCT ZipCode FROM Address

You want an index on ZipCode or at least where ZipCode is the first column:

create index idx_address_zipcode on address(zipcode);

The resulting execution plan should be a scan of the index, which is much faster than processing the original table (and aggregating to get the distinct zip codes).

You could also change your existing index to (zipcode, city, address1, address2). This makes the index more useful (in my opinion), because zipcode is more likely to be used for filtering than address1. However, that index will be larger than one only on zipcode.

edited Jan 18 '19 at 12:41

answered Jan 18 '19 at 12:37

Gordon Linoff

1,242,037
58
646
786

So this would be a nonclustered index? – rds80 Jan 18 '19 at 12:39
1

Yes; you can only have one clustered index, and you already have it. – Amadan Jan 18 '19 at 12:40
@rds80 . . . Yes. I would advise you to have the `AddressId` be the clustered index, to avoid fragmentation when the data is modified. – Gordon Linoff Jan 18 '19 at 12:40
When I only had AddressID has the primary key, the query to get all the records was taking over a minute long. – rds80 Jan 18 '19 at 12:44

score 0 · Answer 2 · answered Jan 18 '19 at 12:45

For maximum performance, you could create an indexed view so that the aggregation is materialized:

CREATE VIEW vw_Address_ZipCode
WITH SCHEMABINDING
AS
SELECT ZipCode, COUNT_BIG(*) AS ZipCodeCount
FROM dbo.Address
GROUP BY ZipCode;
GO
CREATE UNIQUE CLUSTERED INDEX cdx ON dbo.vw_Address_ZipCode(ZipCode);
GO

If you are using Enterprise Edition, the optimizer can consider the indexed view without referencing the view directly:

SELECT DISTINCT ZipCode FROM Address;

In lesser editions, you'll need to query the view and add a NOEXPAND query hint so the index is considered for optimization:

SELECT DISTINCT ZipCode FROM dbo.vw_Address_ZipCode WITH(NOEXPAND);

See the documentation for indexed view requirements.

Very cool! I wouldn’t have to do aggregations like how many records are there per zip code. What I’m trying to achieve is an autocomplete feature on the UI with the zip code. — rds80, Jan 18 '19 at 12:51
`DISTINCT` is an aggregation (single row from multiple source rows). Even if you don't use it, `COUNT_BIG()` is required for the view to be indexed. — Dan Guzman, Jan 18 '19 at 12:54

score 0 · Answer 3 · answered Jan 18 '19 at 12:57

0

1.If possible convert ZipCode datatype from nvarchar to bigint.
2.try group by ZipCode

 SELECT ZipCode FROM Address GROUP BY ZipCode;

answered Jan 18 '19 at 12:57

dvijparekh

936
7
16

Some of the zip codes have 5 numbers "-" 4 numbers. Is there a performance difference between distinct and group by? – rds80 Jan 18 '19 at 13:10
https://stackoverflow.com/questions/25114506/distinct-vs-group-by/25114686#25114686 yes it does check this link – dvijparekh Jan 18 '19 at 13:22
@rds80 if you have "-" in zip code validate it from the client side or validate it with your code at server side and remove "-" if found in the zip code. For values in the database I would suggest replacing "-" with SQL query or to be safe side do it with code – dvijparekh Jan 18 '19 at 13:27

How to improve the performance of my query

3 Answers3