32

We have a query that runs off a fairly large table that unfortunately needs to use LIKE '%ABC%' on a couple varchar fields so the user can search on partial names, etc. SQL Server 2005

Would adding an index on these varchar fields help any in terms of select query performance when using LIKE or does it basically ignore the indexes and do a full scan in those cases?

Any other possible ways to improve performance when using LIKE?

SelVazi
  • 10,028
  • 2
  • 13
  • 29
schooner
  • 3,047
  • 8
  • 30
  • 39

5 Answers5

33

Only if you add full-text searching to those columns, and use the full-text query capabilities of SQL Server.

Otherwise, no, an index will not help.

Lasse V. Karlsen
  • 380,855
  • 102
  • 628
  • 825
  • 1
    Thanks, that is what I thought unfortunately. I've removed some of the LIKE clauses to help speed thigns up a bit. – schooner Apr 30 '09 at 09:38
  • 1
    Like on both side of a string (as specified in question requirements) are not supported by Full Text Seach – gio Apr 22 '22 at 10:23
  • Correct, but with full text search you can get similar queries, just with different syntax and capabilities. – Lasse V. Karlsen Apr 22 '22 at 11:14
  • 1
    if you use fulltext search, it will allow to find words only inside the field and not what he wants to do do (like '%str%') , with full text serach the query will be (where contains(field,'str') and it will return only records containg the whole word 'str' and the 's' nor 'st' etc. , AM I RIGHT ? – Kemal AL GAZZAH May 31 '22 at 08:39
  • see this question https://stackoverflow.com/questions/60248568/search-for-part-of-the-word-in-the-phrase-with-full-text-search-in-sql-server-20 – Kemal AL GAZZAH May 31 '22 at 08:41
14

You can potentially see performance improvements by adding index(es), it depends a lot on the specifics :)

How much of the total size of the row are your predicated columns? How many rows do you expect to match? Do you need to return all rows that match the predicate, or just top 1 or top n rows?

If you are searching for values with high selectivity/uniqueness (so few rows to return), and the predicated columns are a smallish portion of the entire row size, an index could be quite useful. It will still be a scan, but your index will fit more rows per page than the source table.

Here is an example where the total row size is much greater than the column size to search across:

create table t1 (v1 varchar(100), b1 varbinary(8000))
go
--add 10k rows of filler
insert t1 values ('abc123def', cast(replicate('a', 8000) as varbinary(8000)))
go 10000
--add 1 row to find
insert t1 values ('abc456def', cast(replicate('a', 8000) as varbinary(8000)))
go

set statistics io on 
go
select * from t1 where v1 like '%456%'
--shows 10001 logical reads

--create index that only contains the column(s) to search across
create index t1i1 on t1(v1)
go
select * from t1 where v1 like '%456%'
--or can force to 
--shows 37 logical reads

If you look at the actual execution plan you can see the engine scanned the index and did a bookmark lookup on the matching row. Or you can tell the optimizer directly to use the index, if it hadn't decide to use this plan on its own: select * from t1 with (index(t1i1)) where v1 like '%456%'

If you have a bunch of columns to search across only a few that are highly selective, you could create multiple indexes and use a reduction approach. E.g. first determine a set of IDs (or whatever your PK is) from your highly selective index, then search your less selective columns with a filter against that small set of PKs.

If you always need to return a large set of rows you would almost certainly be better off with a table scan.

So the possible optimizations depend a lot on the specifics of your table definition and the selectivity of your data.

HTH! -Adrian

ahains
  • 1,912
  • 12
  • 10
  • Also consider INCLUDE indexes if you're returning a subset of the columns in a wide table. – Rob Garrison Apr 30 '09 at 17:18
  • Also consider the data you are querying too. For example, adding a Filtered Index that excludes Null and/or empty strings and then using that index in your query will help performance a lot. The Index will be smaller and then less for the LIKE to scan. – Jeff Widmer Aug 24 '17 at 17:24
9

The only other way (other than using fulltext indexing) you could improve performance is to use "LIKE ABC%" - don't add the wildcard on both ends of your search term - in that case, an index could work.

If your requirements are such that you have to have wildcards on both ends of your search term, you're out of luck...

Marc

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
  • Does this work only for `LIKE ABC%`, or does it work also for `LIKE %ABC`? Also, I'm curious why it works with just one sided wildcard... Could you elaborate on that, please? – Tom Pažourek Sep 15 '17 at 07:36
  • 1
    @TomPažourek: well - imagine a phone book (if you're old enough to remember what that was :-); if you're searching for someone who's last name **starts with** `Smi`, you'll find `Smith` and `Smithers` etc. fairly quickly. But if you search for someone whose last name **contains** (anywhere in the name) e.g. `chuk`, having the sorted list of names (which is what an **index** really is) doesn't really help you much - you still have to scroll over **all names** to find those with that string in their name – marc_s Sep 15 '17 at 07:59
  • Thanks. :-) Do you happen to know what data structure is used? Is it some form of a trie? Does that mean that `%ABC` won't be using any index? You only mentioned that wildcard on both ends won't work. Will wildcard on the right end work? – Tom Pažourek Sep 15 '17 at 08:03
3

Like '%ABC%' will always perform a full table scan. There is no way around that.

You do have a couple of alternative approaches. Firstly full text searching, it's really designed for this sort of problem so I'd look at that first.

Alternatively in some circumstances it might be appropriate to denormalize the data and pre-process the target fields into appropriate tokens, then add these possible search terms into a separate one to many search table. For example, if my data always consisted of a field containing the pattern 'AAA/BBB/CCC' and my users were searching on BBB then I'd tokenize that out at insert/update (and remove on delete). This would also be one of those cases where using triggers, rather than application code, would be much preferred.

I must emphasis that this is not really an optimal technique and should only be used if the data is a good match for the approach and for some reason you do not want to use full text search (and the database performance on the like scan really is unacceptable). It's also likely to produce maintenance headaches further down the line.

Cruachan
  • 15,733
  • 5
  • 59
  • 112
-3

create statistics on that column. sql srever 2005 has optimized the in string search so you might benfit from that.

Mladen Prajdic
  • 15,457
  • 2
  • 43
  • 51