6

I got more than 7 million rows in a table and

SELECT COUNT(*) FROM MyTable where MyColumn like '%some string%'

gives me 20,000 rows and takes more than 13 seconds.

The table has NONCLUSTERED INDEX on MyColumn.

Is there any way to improve speed?

jong shin
  • 682
  • 1
  • 5
  • 19
  • 6
    Alas, `%substring%` cannot be really efficient even with indexes… However if you are looking for adjacent *words*, then possibly a full text index could help. – Benoit Nov 03 '11 at 06:26
  • The fact you're using the `LIKE` clauses means it's going to have to process *every* row anyway. – Moo-Juice Nov 03 '11 at 06:26
  • 1
    Using the LIKE operator with a leading `%` effectively makes any index useless. If you would be searching for `LIKE 'something%'` the index should be used and it should be **a lot** faster – marc_s Nov 03 '11 at 06:29
  • Try count(1) instead of count(*) – cnd Nov 03 '11 at 06:33
  • 3
    @nCy: this is a *myth* http://stackoverflow.com/questions/1221559/count-vs-count1/1221649#1221649 – gbn Nov 03 '11 at 06:35
  • But interestingly without COUNT(*), SELECT * FROM MyTable where MyColumn like '%some string%' executes really fast. Why is count(*) executes slow? I already tried count(1) and count(primarykey) – jong shin Nov 03 '11 at 06:40
  • 1
    Did you try with `SELECT COUNT(*)` then with `SELECT *` did you then try `SELECT COUNT(*)` again? it could be that your results are just cached in memory still. Using `LIKE '%something%'` will cause your index to be ignored, so it will have to read the whole table anyway, thus one would expect the performance of `SELECT *` to be the same as `SELECT COUNT(*)` – Seph Nov 03 '11 at 06:51
  • @jongshin: Probably because whatever client you are using is adding an implicit LIMIT, or else it is streaming the results. The time to get the *first* row is much faster for `SELECT *` than `SELECT COUNT(*)` – Mark Byers Nov 03 '11 at 06:58
  • There is an index on the MyColumn, so straeming is very likely – Chris Bednarski Nov 03 '11 at 07:54

3 Answers3

5

Leading wildcards searches can not be optimised with T-SQL and won't use an index

Look at SQL Server's full text search

gbn
  • 422,506
  • 82
  • 585
  • 676
3

You could try a full-text search, or a text search engine such as Lucene.

Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
2

Try using a binary collation first, which will mean that the complex Unicode rules are replaced by a simple byte comparison.

SELECT COUNT(*) 
FROM MyTable 
WHERE MyColumn COLLATE Latin1_General_BIN2 LIKE '%some string%'

Also, have a look at chapter titled 'Build your own index' in SQL Server MVP Deep Dives written by Erland Sommarskog

The basic idea is that you introduce a restriction to the user and require the string to be at least three contiguous characters long. Next, you extract all three letter sequences from the MyColumn field and store these fragments in a table together with the MyTable.id they belong to. When looking for a string, you split it into three letter fragments as well, and look up which record id they belong to. This way you find the matching strings a lot quicker. This is the strategy in a nutshell.

The book describes implementation details and ways to optimise this further.

Chris Bednarski
  • 3,364
  • 25
  • 33