1

I know about text indexing, but this is different. I have 2 byte array columns in a table, labeled StartByteArray & EndByteArray. The Start column is a starting IP address in byte array form, and the same with the End column, except it is the stop IP. You can think of the high & low columns as boundaries of IP Addresses. It looks like this (just 10 rows shown):

StartIPAddress  StartByteArray  EndIPAddress    EndByteArray
41.0.0.0        0x29000000      41.31.255.255   0x291FFFFF
41.32.0.0       0x29200000      41.47.255.255   0x292FFFFF
41.48.0.0       0x29300000      41.55.255.255   0x2937FFFF
41.56.0.0       0x29380000      41.56.255.255   0x2938FFFF
41.57.0.0       0x29390000      41.57.63.255    0x29393FFF
41.57.64.0      0x29394000      41.57.79.255    0x29394FFF
41.57.80.0      0x29395000      41.57.95.255    0x29395FFF
41.57.96.0      0x29396000      41.57.111.255   0x29396FFF
41.57.112.0     0x29397000      41.57.115.255   0x293973FF
41.57.116.0     0x29397400      41.57.119.255   0x293977FF

That's it. The reason I did this was to make searching for a row easier, if that row 'contained, or bounded, the given IP Address. Sounds harder than it is.

Put another way, I want to search for the row that my given IP Address (once also converted to byte array) is within.

Now writing the usual SQL is easy (example on SO here, for example), but I've got a feeling there is a clever way to index these columns in such a way that it will be efficient, but all I have done is text indexing, and there are 2 columns here that I'm doing math comparisons to, not letters of words over x characters long.

I'm using SQL Server 2012, and can also convert the data to anything better suited, as I own the DB.

Any thoughts?

Community
  • 1
  • 1
  • I don't understand the problem. Can't you just index on StartByteArray and say WHERE StartByteArray >= start AND StartByteArray <= end? – usr Apr 24 '14 at 19:18
  • @usr Thanks for writing. It isn't actually a _problem_, per se. It is more a question of what is possible with SQL Server. Is there a gain by indexing on 2 columns that aren't text-based? I couldn't find anything definitive on this kind of 2-column searching beyond what you suggest (same approach as the link in my OP). I'm just trying to benefit from the collective, as Google didn't help... :) – Compassionate Narcissist Apr 25 '14 at 01:02

1 Answers1

1

I sense there are some misunderstandings here. I hope I'll find them.

Indexing text columns is not different from indexing any other data type. A B-tree based index can index any data type that has a sort order. All it does is keep all index rows sorted by the key columns. This allows for range and point lookups. Binary data, string data and integer data are all fully supported.

Now writing the usual SQL is easy (example on SO here, for example)

This query does not solve your problem. It returns all rows where the StartByteArray would be in a given range. You want to opposite: You want the search argument to be in the range that a certain row specifies.

I already answered how to look up an IP range.

I've got a feeling there is a clever way to index these columns in such a way that it will be efficient

Just index on StartByteArray. That allows you to find the first row that matches a given IP.

but all I have done is text indexing

Not sure what you mean but whatever it is - it probably a misunderstanding.

Using a binary(4) to store IPs is clever. I never thought of doing that. I used a bigint in the past. That takes twice the amount of storage that would strictly be needed, though.

Community
  • 1
  • 1
usr
  • 168,620
  • 35
  • 240
  • 369
  • [post 1/2] Thanks again for writing. Let me explain a few things you pointed out. you:"Indexing text columns is not different from indexing any other data type." me:When indexing text, you specify what the smallest word is to be indexed by size (in characters), and in SQL Server (more?) they are 'linguistic' searches. you:"Just index on StartByteArray" me:OK. This assumes the columns are always in ORDER BY [ASC] for a 'WHERE' clause, right (per your linked answer)? Please let me know if I haven't cleared up the misunderstandings you mentioned. :) – Compassionate Narcissist Apr 25 '14 at 19:37
  • [post 2/2] Glad you think it's clever! I put the whole class up [here](http://blog.atcp.us/2014/04/25/vb-net-ip-manipulation-utility-functions/), if anyone can help improve on it... – Compassionate Narcissist Apr 25 '14 at 19:37
  • 1
    Oh, you speak of fulltext indexing. This is completely different then b-tree based indexing (`CREATE INDEX...`). I don't see applicability of FTS here. Did you mean to index text-formatted IPs with fulltext search?! Not suitable. "This assumes the columns are always in ORDER BY" => kind of. Indexing helps filtering and sorting. Maybe this will help you: Assume you have an array of IP ranges sorted by StartIP. How would you efficiently find the matching IP range for a given IP? You'd binary search to the first range that might match. And that's how you do it in SQL Server too. – usr Apr 25 '14 at 19:44
  • 1
    The "binary search" part would be called an index seek in SQL Server and it happens when you filter or sort on an indexed column: `TOP 1 ... WHERE StartIP >= IP ORDER BY StartIP` gives the first possibly matching IP range. – usr Apr 25 '14 at 19:45
  • Thanks @usr I thought all text indexing in SQL Server was the same, so that is where my language got confusing I think :) The problem (I think this is how it works in most all RDBMS, but not certain) with the "WHERE" clause is that it has to search every single row-even if it hits a match on the very first one. You see, even if I sort ASC by the StartIP, I was hoping the index on binary column could either (a) stop at first match (useless to continue after 1st match, in this case) or, (b) use index to go _straight_ to the correct row. I need to read up on binary indexes to see how it works! :) – Compassionate Narcissist Apr 26 '14 at 00:19
  • 1
    This article does not look too bad: https://www.simple-talk.com/sql/learn-sql-server/sql-server-index-basics/ I searched for "sql server indexing basics". Maybe it helps you to look at the diagram of the physical tree. You can pretty much tell from it what operations an index can be good for by the way you can navigate in the tree.; Certainly, the whole table does not need to be scanned. Then, the index would be useless. Index a 1TB table and the answer still comes in 10ms. – usr Apr 26 '14 at 09:54