CHARINDEX vs LIKE search gives very different performance, why?

Question

We use Entity Frameworks for DB access and when we "think" LIKE statement - it actually generates CHARINDEX stuff. So, here is 2 simple queries, after I simplified them to prove a point on our certain server:

-- Runs about 2 seconds
SELECT * FROM LOCAddress WHERE Address1 LIKE '%1124%' 
-- Runs about 16 seconds
SELECT * FROM LOCAddress WHERE ( CAST(CHARINDEX(LOWER(N'1124'), LOWER([Address1])) AS int)) = 1

Table contains about 100k records right now. Address1 is VarChar(100) field, nothing special.

Here is snip of 2 plans side by side. Doesn't make any sense, shows 50% and 50% but execution times like 1:8

I searched online and general advice is to use CHARINDEX instead of LIKE. In our experience it's opposite. My question is what causing this and how we can fix it without code change?

Check out [this other SO question](http://stackoverflow.com/questions/7341783/entity-framework-data-annotations-set-stringlength-varchar/7341920#7341920) - you could possibly use data annotations to **tell** EF that this column is a `VARCHAR` and cause it to **stop** interpreting that column as a `nvarchar` when it really isn't ... — marc_s, Sep 28 '15 at 15:50

score 22 · Accepted Answer · edited Sep 28 '15 at 15:48

I will answer my own question since it was hard to find correct answer and I was pointed to the problem by SQL Server 2012 Execution Plan output. As you see in original question - everything looks OK on surface. This is SQL Server 2008.

When I run same query on 2012 I got warning on CHARINDEX query. Problem is - SQL Server had to do type conversion. Address1 is VarChar and query has N'1124' which is Unicode or NVarChar. If I change this query as so:

SELECT * 
FROM LOCAddress 
WHERE (CAST(CHARINDEX(LOWER('1124'), LOWER([Address1])) AS int))

It then runs same as LIKE query. So, type conversion that was caused by Entity Framework generator was causing this horrible hit in performance.

I was also about to mention it but I preferred to point out T-SQL perspective of CHARINDEX and LIKE. Nevertheless, I am glad you have solved your issue yourself :). You should mark your response as answer for future readers ! — Anuj Tripathi, Sep 25 '15 at 19:39

score 6 · Answer 2 · answered Sep 25 '15 at 18:54

First, as you can see both queries are identical and neither can use index. CHARINDEX and LIKE perform same with wildcard. Ex: %YourValue%. However, there performance varies when you use wildcard like 'YourValue%'. Here, LIKE operator will likely to perform faster than CHARINDEX because it may allow partial scan of the index. Now, in your case, both queries are same but there performance is difference because of following possible reason:

Statistics: SQL Server maintains statistics for sub string in string columns which are use by LIKE operator but not fully usable for CHARINDEX. In that case, LIKE operator will work faster than CHARINDEX. You can force SQL Server to use index for CHARINDEX with proper table hints

Ex: FROM LOCAddress WITH (INDEX (index_name))

Read more Here, which in section "string summary stastics" says:

SQL Server 2008 includes patented technology for estimating the selectivity of LIKE conditions. It builds a statistical summary of substring frequency distribution for character columns (a string summary). This includes columns of type text, ntext, char, varchar, and nvarchar. Using the string summary, SQL Server can accurately estimate the selectivity of LIKE conditions where the pattern may have any number of wildcards in any combination.

LIKE will use indexes *only* for prefix scans. This doesn't have anything to do with selectivity. A prefix scan is effectively a range search starting with the prefix and ending with the next word after it alphabetically. This allows the server to use the index to perform the equivalent of `where field>='YourValue' and field<'YourValuf'` — Panagiotis Kanavos, Sep 28 '15 at 15:31

CHARINDEX vs LIKE search gives very different performance, why?

2 Answers2

Linked