3

I have a table that contains ranges of IP addresses (as integer values) and corresponding country, region and city assigned to that range. It looks as follows:

CREATE TABLE [dbo].[IpToRegion]
(
    [BeginRange] [bigint] NOT NULL,
    [EndRange] [bigint] NOT NULL,
    [CountryCode] [varchar](10) NOT NULL,
    [Country] [varchar](50) NOT NULL,
    [Region] [varchar](100) NOT NULL,
    [City] [varchar](100) NOT NULL
) ON [PRIMARY]

CREATE UNIQUE CLUSTERED INDEX [ClusteredIndex-20151031-193911] ON [dbo].[IpToRegion]
(
    [BeginRange] ASC,
    [EndRange] ASC
)
GO

There are 9.1M rows in this table. In order to find the location of a single IP address, I first convert it to a big int and then execute the following query:

DECLARE @IPNumber BIGINT
DECLARE @IPAddress varchar(20)

Set @IPNumber = (CONVERT(bigint, PARSENAME(@IPAddress,1)) + CONVERT(bigint, PARSENAME(@IPAddress,2)) * 256 + CONVERT(bigint, PARSENAME(@IPAddress,3)) * 65536 + CONVERT(bigint, PARSENAME(@IPAddress,4)) * 16777216)

Select City + ', ' + Region + ', ' + Country
From IpToRegion 
Where @IPNumber Between BeginRange And EndRange

The problem is this query can take anywhere from 5 to 20 seconds to execute. Here is the query plan:

enter image description here

Of course, my question is how can this query take so long to execute? It's doing a seek on the clustered index and returns a single row. I could try a few different indexing strategies. But, at this point, I'm more curious as to why this query can perform so poorly.

Randy Minder
  • 47,200
  • 49
  • 204
  • 358
  • That is a good question. I can check. This table was just created and it's static. I simply created the table and loaded the data. The rows were added in BeginRange order, but I'll try rebuilding the index. – Randy Minder Nov 03 '15 at 22:50
  • I don't know why somebody downvoted, but try to change the title of the question to something more descriptive. Like "how to search efficiently in IP addresses ranges"... – Vladimir Baranov Nov 03 '15 at 22:58

2 Answers2

5

This kind of search can't be done efficiently with the index that you have.

If you look at details of the Index Seek operator in the plan you'd see two predicates.

@IPNumber >= BeginRange 
@IPNumber <= EndRange

Index helps to quickly O(log(n)) find the start (or end) of the range, but then it has to check the second predicate for the rest of the rows in the table.

Check out the actual number of rows read in the plan. It will be large.

If I'm not mistaken, there was somewhat similar (more complicated) question before. Even though, it was asked for Postgres, the approach would work in SQL Server as well. In that question this kind of search was done not once, but 600K times.


The answer to the question "How to make this search efficient" depends on few things. First: can you guarantee that IP ranges in your table do not overlap? In other words, can you guarantee that any search will return 0 or 1 rows?

If yes, adding a simple TOP(1) to the query could be enough.

Community
  • 1
  • 1
Vladimir Baranov
  • 31,799
  • 5
  • 53
  • 90
0

It turns out that the clustered index I had on BeginRange + EndRange was not efficient as Vladimir Baranov stated in his answer. What I did was to create a PK / clustered index on BeginRange and a separate index on EndRange. Now the query performs instantly.

Randy Minder
  • 47,200
  • 49
  • 204
  • 358