106

This question skirts around what I'm wondering, but the answers don't exactly address it.

It would seem that in general '=' is faster than 'like' when using wildcards. This appears to be the conventional wisdom. However, lets suppose I have a column containing a limited number of different fixed, hardcoded, varchar identifiers, and I want to select all rows matching one of them:

select * from table where value like 'abc%'

and

select * from table where value = 'abcdefghijklmn'

'Like' should only need to test the first three chars to find a match, whereas '=' must compare the entire string. In this case it would seem to me that 'like' would have an advantage, all other things being equal.

This is intended as a general, academic question, and so should not matter which DB, but it arose using SQL Server 2005.

Community
  • 1
  • 1
  • 30
    One major thing you left out is whether or not `value` is indexed. If it is, then `=` is a simple lookup with no table scan necessary and will beat the pants off of any `LIKE` statement that you throw at it. – Daniel DiPaolo May 26 '11 at 16:59
  • 8
    @Daniel I think that's incorrect. A `LIKE` with a wildcard at the end is SARGable and thus will perform a range seek on an index, no table scan in sight. That range seek can compete quite handily with an `=` statement, and in many cases (like if all the satisfying rows are on one page, a not unlikely condition) could be exactly the same performance, entailing the same number of reads. – ErikE May 26 '11 at 17:47
  • My "all other things being equal" was intended to cover the "indexed or not" issue, but there seems to be at least some controversy over how much difference that would make, per my comments on the other answers. – MickeyfAgain_BeforeExitOfSO May 26 '11 at 18:06
  • See my answer. I initially tested unindexed and performance is identical (both table scans were exactly the same). I assumed for my test scenario that it would be indexed, otherwise why would you even care about performance? – JNK May 26 '11 at 18:09
  • 5
    All the talk of 'like' in this question and the answers makes us sound like a bunch of high school girls. Like, totally. – JulianR May 26 '11 at 23:34

10 Answers10

80

See https://web.archive.org/web/20150209022016/http://myitforum.com/cs2/blogs/jnelson/archive/2007/11/16/108354.aspx

Quote from there:

the rules for index usage with LIKE are loosely like this:

  • If your filter criteria uses equals = and the field is indexed, then most likely it will use an INDEX/CLUSTERED INDEX SEEK

  • If your filter criteria uses LIKE, with no wildcards (like if you had a parameter in a web report that COULD have a % but you instead use the full string), it is about as likely as #1 to use the index. The increased cost is almost nothing.

  • If your filter criteria uses LIKE, but with a wildcard at the beginning (as in Name0 LIKE '%UTER') it's much less likely to use the index, but it still may at least perform an INDEX SCAN on a full or partial range of the index.

  • HOWEVER, if your filter criteria uses LIKE, but starts with a STRING FIRST and has wildcards somewhere AFTER that (as in Name0 LIKE 'COMP%ER'), then SQL may just use an INDEX SEEK to quickly find rows that have the same first starting characters, and then look through those rows for an exact match.

(Also keep in mind, the SQL engine still might not use an index the way you're expecting, depending on what else is going on in your query and what tables you're joining to. The SQL engine reserves the right to rewrite your query a little to get the data in a way that it thinks is most efficient and that may include an INDEX SCAN instead of an INDEX SEEK)

Tim M.
  • 105
  • 2
  • 8
BonyT
  • 10,750
  • 5
  • 31
  • 52
56

It's a measureable difference.

Run the following:

Create Table #TempTester (id int, col1 varchar(20), value varchar(20))
go

INSERT INTO #TempTester (id, col1, value)
VALUES
(1, 'this is #1', 'abcdefghij')
GO

INSERT INTO #TempTester (id, col1, value)
VALUES
(2, 'this is #2', 'foob'),
(3, 'this is #3', 'abdefghic'),
(4, 'this is #4', 'other'),
(5, 'this is #5', 'zyx'),
(6, 'this is #6', 'zyx'),
(7, 'this is #7', 'zyx'),
(8, 'this is #8', 'klm'),
(9, 'this is #9', 'klm'),
(10, 'this is #10', 'zyx')
GO 10000

CREATE CLUSTERED INDEX ixId ON #TempTester(id)CREATE CLUSTERED INDEX ixId ON #TempTester(id)

CREATE NONCLUSTERED INDEX ixTesting ON #TempTester(value)

Then:

SET SHOWPLAN_XML ON

Then:

SELECT * FROM #TempTester WHERE value LIKE 'abc%'

SELECT * FROM #TempTester WHERE value = 'abcdefghij'

The resulting execution plan shows you that the cost of the first operation, the LIKE comparison, is about 10 times more expensive than the = comparison.

If you can use an = comparison, please do so.

JNK
  • 63,321
  • 15
  • 122
  • 138
  • 4
    +1 for actually testing it. Just looking at the showplan may not tell the whole story though. I'm going to do some of my own testing and will let everyone know if I find anything unexpected. – Tom H May 26 '11 at 18:23
  • 1
    Tom - true, but it gave me enough of an indication that the two were NOT processed the same behind the scenes. – JNK May 26 '11 at 18:24
  • 1
    The costs shown in the execution plan are wrong. They do not reflect actual performance. In the first plan they are based on an estimated rowcount of `19.95` so SQL Server costs in an additional 19 key lookups that never materialize in actuality (Even in the **actual** execution plan the costs shown are based on the **Estimated** sub tree cost) – Martin Smith May 26 '11 at 18:39
  • 1
    I've just done your test as well as one with about 1M rows and in both cases performance and the query plans were identical. This is on SQL 2008 as I don't have 2005 on this machine. – Tom H May 26 '11 at 18:41
  • I did it on 2008. Query plan **operators** were identical, not the costs. Let me do another check... – JNK May 26 '11 at 18:44
  • If I'm reading the results right, I get identical costs. SQL 2005. – MickeyfAgain_BeforeExitOfSO May 26 '11 at 18:48
  • Look at `STATISTICS IO` - the `LIKE` is generating more Logical Reads for me. – JNK May 26 '11 at 18:48
  • I'm showing about 10% more logical reads for the `LIKE` operator. – JNK May 26 '11 at 18:53
  • @JNK - I get `Scan count 1, logical reads 5, physical reads 0` on running the table population script and queries in your answer (for both). What do you get then? Also are you still running against the same `90,001` rows? – Martin Smith May 26 '11 at 19:08
  • @Martin - I increased rowcount to do some more testing. It's a small diff but I got 7 and 5 at 100k rows and 41 and 36 at 900k – JNK May 26 '11 at 19:17
  • @Martin - forgot to mention, I also materialized into a non-temp table which seems to have made the biggest difference. – JNK May 26 '11 at 19:22
  • If you try `DBCC TRACEON (652)` then your queries then `DBCC TRACEOFF (652).` do you still see the difference in logical reads? (this temporarily disables read ahead so you might not want to do this on a production box!) – Martin Smith May 26 '11 at 19:25
  • I'll see if I can reproduce this end then. I suspect that the extra logical reads are coming from additional logical reads due to the read ahead mechanism. Perhaps SQL Server is a bit more eager to do that on a range seek than an equality seek. The background to this is mentioned in the comments to this post http://sqlblog.com/blogs/paul_white/archive/2011/02/17/so-is-it-a-seek-or-a-scan.aspx – Martin Smith May 26 '11 at 19:34
  • Without the indexes, the performance of the SELECT from the 900000 row table takes 358ms for `LIKE` and 250ms for `=` - presumably because the `LIKE` is more CPU intensive as the I/O is unsurprisingly identical. – Will A May 26 '11 at 19:41
  • @Will - check it on a temp table vs a materialized table – JNK May 26 '11 at 19:47
  • With `1,800,001` rows I saw a differerence of `Scan count 1, logical reads 7, physical reads 0` vs `Scan count 1, logical reads 6, physical reads 0` but on disabling read ahead they both gave the same figures – Martin Smith May 26 '11 at 19:48
  • Than I guess it's marginal-to-0 difference :) – JNK May 26 '11 at 19:49
  • 1
    @JNK - just tried it - there's a negligible difference, the disparity is the same, however. 327ms for `LIKE`, 203ms for `=`. I expect if I ran more tests and took accurate averages, there'd be no real difference between #temp and real table. – Will A May 26 '11 at 19:50
14

You should also keep in mind that when using like, some sql flavors will ignore indexes, and that will kill performance. This is especially true if you don't use the "starts with" pattern like your example.

You should really look at the execution plan for the query and see what it's doing, guess as little as possible.

This being said, the "starts with" pattern can and is optimized in sql server. It will use the table index. EF 4.0 switched to like for StartsWith for this very reason.

Blindy
  • 65,249
  • 10
  • 91
  • 131
  • 2
    No relational database worth its salt will ignore an index when the like pattern is part of the query and the wildcard is trailing. That may be a different story if you're binding the value and the database supports binding separate from query preparation. – Dave W. Smith May 26 '11 at 17:04
  • That's what my gut is telling me too, but I only have hands-on experience with sql server in this regard, so I focused on it specifically. – Blindy May 26 '11 at 17:08
8

A personal example using mysql 5.5: I had an inner join between 2 tables, one of 3 million rows and one of 10 thousand rows.

When using a like on an index as below(no wildcards), it took about 30 seconds:

where login like '12345678'

using 'explain' I get:

enter image description here

When using an '=' on the same query, it took about 0.1 seconds:

where login ='12345678'

Using 'explain' I get:

enter image description here

As you can see, the like completely cancelled the index seek, so query took 300 times more time.

Aris
  • 4,643
  • 1
  • 41
  • 38
8

You are asking the wrong question. In databases is not the operator performance that matters, is always the SARGability of the expression, and the coverability of the overall query. Performance of the operator itself is largely irrelevant.

So, how do LIKE and = compare in terms of SARGability? LIKE, when used with an expression that does not start with a constant (eg. when used LIKE '%something') is by definition non-SARGabale. But does that make = or LIKE 'something%' SARGable? No. As with any question about SQL performance the answer does not lie with the query of the text, but with the schema deployed. These expression may be SARGable if an index exists to satisfy them.

So, truth be told, there are small differences between = and LIKE. But asking whether one operator or other operator is 'faster' in SQL is like asking 'What goes faster, a red car or a blue car?'. You should eb asking questions about the engine size and vechicle weight, not about the color... To approach questions about optimizing relational tables, the place to look is your indexes and your expressions in the WHERE clause (and other clauses, but it usually starts with the WHERE).

Remus Rusanu
  • 288,378
  • 40
  • 442
  • 569
7

If value is unindexed, both result in a table-scan. The performance difference in this scenario will be negligible.

If value is indexed, as Daniel points out in his comment, the = will result in an index lookup which is O(log N) performance. The LIKE will (most likely - depending on how selective it is) result in a partial scan of the index >= 'abc' and < 'abd' which will require more effort than the =.

Note that I'm talking SQL Server here - not all DBMSs will be nice with LIKE.

Will A
  • 24,780
  • 5
  • 50
  • 61
  • I don't think you know how binary search works. Both the `=` case and the `like '...%'` case behave the same if sql recognizes the pattern (and it does), because in both cases the sub-trees are chosen based on comparison relations. – Blindy May 26 '11 at 17:07
  • Oh, I do. LIKE will most likely behave worse although it'll still be O(log N) if the selectivity is high enough - O(log N) to find out where to start the partial scan from, then a number of forward reads through the index until the end point `'abd'` is reached. – Will A May 26 '11 at 17:09
  • Yes but the OP's example assumes there is only one value in that range, so with that in mind, the comparisons will be identical. – Blindy May 26 '11 at 17:14
  • Valid point - it's not _completely_ clear that this is what the OP was saying, but I think it's more likely the case than not. In that case, the performance will be pretty much identical. – Will A May 26 '11 at 17:17
  • The range seek of a LIKE will likely compete quite handily with an = statement, and in many cases (like if all the satisfying rows are on one page, a not unlikely condition) could be exactly the same performance, entailing the same number of reads. I think saying "will require more effort" is a mistaken blanket statement. – ErikE May 26 '11 at 17:48
  • If I'm understanding what's said here, then if unindexed the performance is negligible because the the table scan is so much more costly than the presumably required comparison of each row (as implied in my question). But that row scan *will* - the heart of my question - make a difference (?). On the other hand, if indexed, and with yes, a single value in each range, performance will also be virtually identical. – MickeyfAgain_BeforeExitOfSO May 26 '11 at 18:18
  • I would say, indexed or not, the performance would be very close regardless of `=` or `LIKE` if the `LIKE` matches the same number of rows as the `=`. The performance will be poor unindexed, and great indexed. Using `LIKE 'abc%'` because you happen to know it matches just one row, however, is confusing and may be dangerous in the future should another matching row come along. – Will A May 26 '11 at 19:29
5

= is much faster than LIKE, even without wildcard. I tested on MySQL with 11GB of data and more than 100 million of records, the f_time column is indexed.

SELECT * FROM XXXXX WHERE f_time = '1621442261' 
#took 0.00sec and return 330 records

SELECT * FROM XXXXX WHERE f_time LIKE '1621442261' 
#took 44.71sec and return 330 records
Zmnako Awrahman
  • 538
  • 7
  • 19
beloncfy
  • 91
  • 1
  • 1
2

Besides all the answers, there this to consider:

'like' is case insensitive, so every character needs to be compared twice, whereas the '=' only compares once for identical characters.

This issue arises with or without indexes.

1

I was working with a huge database that has more then 400M records and I put LIKE in search query. Here is the final results.

There were three tables tb1, tb2 and tb3. When I use EQUAL for in all tables QUERY the response time was 193ms. and when I put LIKE in one of he table the response time was 19.22 sec. and for all table LIKE response time was 112 Sec

Engr.Aftab Ufaq
  • 3,356
  • 3
  • 21
  • 47
0

Maybe you are looking about Full Text Search.

In contrast to full-text search, the LIKE Transact-SQL predicate works on character patterns only. Also, you cannot use the LIKE predicate to query formatted binary data. Furthermore, a LIKE query against a large amount of unstructured text data is much slower than an equivalent full-text query against the same data. A LIKE query against millions of rows of text data can take minutes to return; whereas a full-text query can take only seconds or less against the same data, depending on the number of rows that are returned.