0

The query below to get the distinct zip codes from the Address table takes roughly 4 mins and 42 seconds. There are 1,006,699 records in the Address table. The composite key for the table is Address1, Address2, City, ZipCode.

There have been times when the query takes 5 seconds to run or even 1 mill seconds.

How do I improve the performance of the query?

Here is the SQL query:

 SELECT DISTINCT ZipCode FROM Address

Here is the schema for the table:

CREATE TABLE [dbo].[Address]
(
    [AddressID] [INT] IDENTITY(1,1) NOT NULL,
    [Address1] [NVARCHAR](1000) NOT NULL,
    [Address2] [NVARCHAR](1000) NOT NULL,
    [City] [NVARCHAR](1000) NOT NULL,
    [StateCd] [NVARCHAR](2) NULL,
    [ZipCode] [NVARCHAR](10) NOT NULL,

    PRIMARY KEY CLUSTERED 
    ([Address1] ASC, [Address2] ASC, [City] ASC, [ZipCode] ASC)
) ON [PRIMARY]
GO

ALTER TABLE [dbo].[Address] ADD DEFAULT ('') FOR [Address2]
GO

I can't seem to add an image of the execution plan.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
rds80
  • 631
  • 1
  • 10
  • 23

3 Answers3

2

For this query:

SELECT DISTINCT ZipCode FROM Address

You want an index on ZipCode or at least where ZipCode is the first column:

create index idx_address_zipcode on address(zipcode);

The resulting execution plan should be a scan of the index, which is much faster than processing the original table (and aggregating to get the distinct zip codes).

You could also change your existing index to (zipcode, city, address1, address2). This makes the index more useful (in my opinion), because zipcode is more likely to be used for filtering than address1. However, that index will be larger than one only on zipcode.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

For maximum performance, you could create an indexed view so that the aggregation is materialized:

CREATE VIEW vw_Address_ZipCode
WITH SCHEMABINDING
AS
SELECT ZipCode, COUNT_BIG(*) AS ZipCodeCount
FROM dbo.Address
GROUP BY ZipCode;
GO
CREATE UNIQUE CLUSTERED INDEX cdx ON dbo.vw_Address_ZipCode(ZipCode);
GO

If you are using Enterprise Edition, the optimizer can consider the indexed view without referencing the view directly:

SELECT DISTINCT ZipCode FROM Address;

In lesser editions, you'll need to query the view and add a NOEXPAND query hint so the index is considered for optimization:

SELECT DISTINCT ZipCode FROM dbo.vw_Address_ZipCode WITH(NOEXPAND);

See the documentation for indexed view requirements.

Dan Guzman
  • 43,250
  • 3
  • 46
  • 71
  • Very cool! I wouldn’t have to do aggregations like how many records are there per zip code. What I’m trying to achieve is an autocomplete feature on the UI with the zip code. – rds80 Jan 18 '19 at 12:51
  • 1
    `DISTINCT` is an aggregation (single row from multiple source rows). Even if you don't use it, `COUNT_BIG()` is required for the view to be indexed. – Dan Guzman Jan 18 '19 at 12:54
0

1.If possible convert ZipCode datatype from nvarchar to bigint.
2.try group by ZipCode

 SELECT ZipCode FROM Address GROUP BY ZipCode;
dvijparekh
  • 936
  • 7
  • 16
  • Some of the zip codes have 5 numbers "-" 4 numbers. Is there a performance difference between distinct and group by? – rds80 Jan 18 '19 at 13:10
  • https://stackoverflow.com/questions/25114506/distinct-vs-group-by/25114686#25114686 yes it does check this link – dvijparekh Jan 18 '19 at 13:22
  • @rds80 if you have "-" in zip code validate it from the client side or validate it with your code at server side and remove "-" if found in the zip code. For values in the database I would suggest replacing "-" with SQL query or to be safe side do it with code – dvijparekh Jan 18 '19 at 13:27