1

I'm looking for optimal way to search through millions of records that contain serial number saved as varchar column which ends with specified string key.

I was using EndsWith, however performance is rather poor if several queries are sent.

Is there a better way to do it?

EDIT:

Since search key is of variable length, I can't create column that holds cut-off value of serial number. However, I've done some tests with using Substring and Equals vs EndsWith and I've lowered down execution speed to 40% of the one of EndsWith.

I'm still looking for better solution though :)

Admir Tuzović
  • 10,997
  • 7
  • 35
  • 71

1 Answers1

4

Unfortunately, searching for strings ending with a particular pattern is difficult on most databases+, because searching for string suffixes cannot use an index. This results in full table scans, which may be slow on tables with millions of rows.

If your database supports reverse indexes, add one for your string key column; otherwise, you can improve performance by simulating reverse indexes:

  • Add a column for storing your string key in reverse
  • If your RDBMS supports computed columns, add one for the reversed key
  • Otherwise, define a trigger that populates the reversed column from the key column
  • Create an index on the reversed column
  • Use the reversed column for your searches by passing in the reversed suffix that you are looking for.

For example, if you have data like this

key
-----------
01-02-3-xyz
07-12-8-abc

then the augmented table would have

key           rev_key
-----------   -----------
01-02-3-xyz   zyx-3-20-10
07-12-8-abc   cba-8-21-70

and your search for ENDS_WITH(key, '3-xyz') would ask for STARTS_WITH(rev_key, 'zyx-3'). Since string indexes speed up lookups by prefix, the "starts with" lookup would go much faster.


+ One notable exception is Oracle, which provides reverse key indexes specifically for situations like this.
Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
  • I'll try your solution and see the actual performance improvement as soon as I generated reversed keys retroactively. I like the idea :) – Admir Tuzović Mar 25 '13 at 11:53
  • Unfortunately this has yielded no performance optimization. Speed has remained exactly the same as with EndsWith. Too bad, sounded like smart solution :( – Admir Tuzović Mar 25 '13 at 12:48
  • @BarisaPuter This is very strange. Could you try using `rev_key like 'zyx-3%'` instead of starts with? – Sergey Kalinichenko Mar 25 '13 at 12:54
  • 1
    This **will** improve performance if implemented correctly. @OP please post your queries, SQL and execution plans. – usr Mar 25 '13 at 12:55
  • I've implemented exactly as it was specified here. I'm using EF, so I'm not writing exact queries but rather using LINQ syntax. I've managed to get improvement with using Substring instead of EndsWith. Reverse serial number and StartsWith made no difference at all. It could be very likely due to the fact that Serial number is never longer than 10 characters. – Admir Tuzović Mar 25 '13 at 13:09
  • @BarisaPuter Since you are using EF, you may want to take a look at [this question](http://stackoverflow.com/a/11371822/335858), it explains how to use `LIKE` with EF through `SqlFunctions`. – Sergey Kalinichenko Mar 25 '13 at 13:27
  • @BarisaPuter Just to make sure, you added an index on the `rev_key` all by itself, with no other columns, right? – Sergey Kalinichenko Mar 25 '13 at 13:36
  • You want me to remove all other indexes on the table? Can I leave at least PK/identity column (serves as clustered index I believe)? – Admir Tuzović Mar 25 '13 at 13:37
  • @BarisaPuter No, keep all the indexes that you have, just make sure that the index on `rev_key` lists no other columns, like this: `create index on my_table(rev_key)` – Sergey Kalinichenko Mar 25 '13 at 13:45
  • No, it's added only for that column. There is some difference in speed, but it's more like 10% improvement. On the other hand, when I'm doing inserts in bulk, I'm going to get hit by delays due to extra index, so unless it gives huge performance boost, I'll rather stay away from this solution :( On the topic of SqlFunctions, I believe StartsWith is translated to LIKE 'key%' by EF itself. – Admir Tuzović Mar 25 '13 at 13:47
  • @BarisaPuter Out of curiosity, could you try the two queries (the original and the one w/reversed strings) in SQL to compare their performance? – Sergey Kalinichenko Mar 25 '13 at 13:56
  • I will accept your answer as correct as it is probably working in situations where you need to handle larger portions of text. However, I've done tests with all 3 approaches(startswith, endswith, substring) and speed of each query is around 2-4 seconds which I believe is ok. My problem is caused due to need to do about 100 of similar queries at the same time => I will most likely have to look for some other approach to speed up application behavior. Thanks a lot for the effort! – Admir Tuzović Mar 25 '13 at 15:29