4

I have some performance issues.

I have a table with about 2 million rows.

CREATE TABLE [dbo].[M8](
    [M8_ID] [int] IDENTITY(1,1) NOT NULL,
    [APPLIC] [char](8) NOT NULL,
    [NIVALERTE] [numeric](1, 0) NOT NULL,
    [LOGDH] [datetime2](7) NULL,
    [USERX] [char](20) NOT NULL,
    [TACHE] [char](3) NOT NULL,
    [PRG] [char](32) NOT NULL,
    [DOS] [numeric](3, 0) NOT NULL,
    [ERRNUM] [numeric](5, 0) NOT NULL,
    [LOGTXT] [char](200) NOT NULL)

I read them with C# and ADO.NET

In the management studio (SQL Server 2008 R2), with that query :

SELECT 
    M8.M8_ID, M8.APPLIC, M8.NIVALERTE, M8.LOGDH, M8.USERX, M8.TACHE, 
    M8.PRG, M8.DOS, M8.ERRNUM, M8.LOGTXT
FROM 
    M8 AS M8 WITH(NOLOCK) 
WHERE 
    ((M8.APPLIC LIKE 'DAV' ) )
ORDER BY 
    M8.LOGDH DESC, M8.M8_ID ASC
OPTION (FAST 1)

It take about 1 minute to have the first rows.

But, with

DECLARE @APPLIC_ZOOMAPRESCLE_ZOOM_LIKE_APPLIC_WHERE_0 as char(8) = 'DAV'

SELECT 
   M8.M8_ID, M8.APPLIC, M8.NIVALERTE, M8.LOGDH, M8.USERX, M8.TACHE, 
   M8.PRG, M8.DOS, M8.ERRNUM, M8.LOGTXT
FROM 
   M8 AS M8 WITH(NOLOCK) 
WHERE 
   ((M8.APPLIC LIKE @APPLIC_ZOOMAPRESCLE_ZOOM_LIKE_APPLIC_WHERE_0 ) )
ORDER BY 
   M8.LOGDH DESC, M8.M8_ID ASC
OPTION(FAST 1)

I get the first rows after 4 seconds.

PS : I know, I have no % in the like.

Edit: Here are the execution plans https://www.dropbox.com/sh/jgai5f9txbs84x6/EP5_hj8DNv

Xavinou
  • 802
  • 1
  • 6
  • 18
  • 1
    do you run them in that order? – Jodrell Sep 17 '12 at 08:44
  • 3
    Then it is better to use = instead of like if you don't need it. – András Ottó Sep 17 '12 at 08:44
  • @Jodrell : I cleared the cache between them – Xavinou Sep 17 '12 at 08:45
  • @AndrásOttó : I kown, but it is a generic query. In that program, I can't... – Xavinou Sep 17 '12 at 08:46
  • Notably `[APPLIC]` is a `char(8)`, I'm wondering if it pays to compare "like for like", excuse the pun. – Jodrell Sep 17 '12 at 08:50
  • Indeed, if I write `LIKE 'DAV '`, it is fast. But, my engine doesn't know that... – Xavinou Sep 17 '12 at 08:54
  • @Xavinou, I suspect that when the comprison value is of an equal length the operation is treated like an equals (this makes sense.) In that situation, an index on `[APPLIC]` can be used. Have you compared the query plans for the two operations? – Jodrell Sep 17 '12 at 09:03
  • However the query is interpreted, the query plan will show you. – Jodrell Sep 17 '12 at 09:15
  • @Jodrell I've added the execution plans. – Xavinou Sep 17 '12 at 12:57
  • Do you have auto update statistics turned off? In any event try updating the stats on that column. It converts the `LIKE 'DAV'` to a range seek on `>= DAV` and `<= DAV` and estimates that 31 rows will match, though in fact 476,672 do so it ends up doing a whole load of key lookups and needs to sort this half million rows first before it can emit the first one. The other plan still needs to do the sort but has a better estimate and uses a parallel plan with a scan and no look ups. – Martin Smith Sep 17 '12 at 13:15
  • Also the memory grant for the sort in the slow version is probably no where near adequate due to the bad estimates so this is spilling to `tempdb`. The memory grants are 1,024 vs 340,896 in the two plans despite the fact they both need to sort the exact same data. – Martin Smith Sep 17 '12 at 13:23
  • @MartinSmith Even after updating stats, it estimates 30 rows. – Xavinou Sep 17 '12 at 13:47
  • Strange. What does the output of `DBCC SHOW_STATISTICS ("dbo.M8", INDEX_A);` look like? Can you see where it is getting this 30 row figure from? – Martin Smith Sep 17 '12 at 13:51
  • @MartinSmith I've added the result on my dropbox. I've added one million rows too. With a total of 3,500,288 rows and 1,099,264 rows are matching. Estimation : 72. And I have no idea where it's getting those 72... – Xavinou Sep 17 '12 at 14:04
  • I can reproduce this end as well by adding a load of rows with value `DAV` and then trying your query. This seems like a bug in the statistics estimation to me. Actually despite the fact that the plan shows as range seek `(M8.APPLIC >= 'DAV' and M8.APPLIC <= 'DAV' )` that query does have accurate estimates the stats problem only affects the `LIKE` version. As you noted if padded out with 5 trailing spaces then estimates are accurate again too. It looks like it uses an exact textual match for the stats even though the query itself doesn't require that. – Martin Smith Sep 17 '12 at 14:13
  • It means i'm screwed... Until to find something, I'm gonna add the % because I don't have the exact length of the column. – Xavinou Sep 17 '12 at 14:47
  • 1
    @Xavinou [Reported here so see what they say](https://connect.microsoft.com/SQLServer/feedback/details/763060/statistics-estimation-for-like-on-a-char-column-can-be-very-wrong) – Martin Smith Sep 17 '12 at 15:09
  • whaou, you've even create a bug report :D Thanks for your help. – Xavinou Sep 17 '12 at 15:20

2 Answers2

5

Your table has 1,517,820 rows. Of these nearly one third (476,672) contain the value DAV (or more accurately the value DAV      as it is of CHAR(8) datatype so is padded out with trailing spaces.

In a LIKE comparison trailing spaces in the match_expression are not significant (though they are significant in the pattern itself).

Therefore the expression WHERE APPLIC LIKE 'DAV' does in fact match 476,672 rows. Neither of the execution plans estimate anywhere near this however. Though the faster plan (with variables) is three orders of magnitude closer.

+-----------------------+-----------+-----------+
|                       | Slow Plan | Fast Plan |
+-----------------------+-----------+-----------+
| Estimated # Rows      | 32        | 47,343    |
| Memory Grant          | 1 MB      | 333 MB    |
| Degree of Parallelism | 1         | 4         |
+-----------------------+-----------+-----------+

For the plan with the variables as SQL Server does not do variable sniffing (without e.g. the OPTION (RECOMPILE) hint) it falls back on guesses as to how many rows will match the predicate and comes up with an estimate that around 3.1% of the table will qualify.

Fast Plan

Fast

The plan with the literal value ought to have much better estimates. The screen shot you supplied of the DBCC SHOW_STATISTICS output (after adding another million rows) shows that DAV is definitely in there

Stats

Unfortunately it seems that although the trailing space in the column values are not significant for the query result their presence does mess up the cardinality estimates (Reported as a bug here and currently stated to be fixed in the next version). As a result of this problem it estimates only a handful of rows will be returned and comes up with the following plan.

Slow Plan

Slow

As well as performing half a million key lookups because of the poor cardinality estimates the memory grant is probably no where near adequate for the size of data being sorted resulting in spills to tempdb.

There are many work arounds you might consider if you can change the query or table schema.

  • Using = instead of LIKE
  • Changing the WHERE clause to LIKE CAST('DAV' AS CHAR(8))
  • Changing the column datatype to VARCHAR(8) (and ensuring all stored values are trimmed).
  • Dropping the current index being seeked (Index_A). You haven't supplied its definition but if it is a single column index on a column with few distinct values its presence may be more of a hindrance than a help (depending on your query workload))
  • Adding a covering index with key column APPLIC (and possibly LOGDH DESC, M8_ID ASC to avoid a sort) and the other referenced columns as INCLUDED.
Community
  • 1
  • 1
Martin Smith
  • 438,706
  • 87
  • 741
  • 845
  • Thank you very much for these explanations and workarounds. I think I'll either add a %, because with DB2 `= 'DAV'` not return `'DAV '` or cast 'DAV' as char(8). – Xavinou Sep 19 '12 at 07:35
  • @Xavinou - The reason I didn't suggest the `%` is because it might change the semantics if you were to (at some time in the future) need another value of `APPLIC` that begins with `DAV`. It will bring back those rows as well. – Martin Smith Sep 19 '12 at 07:38
0

Maybe these 2 questions will give you a better understanding of the performance issues involved on the LIKE and = operators:

Community
  • 1
  • 1
Yaroslav
  • 6,476
  • 10
  • 48
  • 89
  • Thanks, but it is not really the use of like or (=) i'm interested in, but the perf issues with using a DECLARE or not – Xavinou Sep 17 '12 at 08:56
  • 1
    @Xavinou but reading the articles might give you some insight. I beleive that when the right hand side of the comparison is the right length the engine treats it like an equals and makes use of an index. When it is shorter, it does not. – Jodrell Sep 17 '12 at 08:59
  • @Jodrell - Without a leading wildcard the `LIKE` can use a range seek. Even with a leading wildcard it can use a range seek but the range is the whole index. The issue is with the `DECLARE` as suspected by the OP. SQL Server doesn't do variable sniffing so doesn't know how selective the query will be. – Martin Smith Sep 17 '12 at 09:05
  • You are right @Jodrell, that was my point with my answer. I believe checking the execution plans should give you some hint on this issue... oopss... Martin Smith just commented, another hint on the use of variables. – Yaroslav Sep 17 '12 at 09:10
  • @MartinSmith, but, the OP says the `DECLARE` query is significantly faster? Are you saying the engine is not checking the parameter for wildcards before forming the plan? – Jodrell Sep 17 '12 at 09:11
  • 1
    @Jodrell - True hadn't spotted that the OP was saying that the variable version was faster. They would need to post the plans so we can see the difference. But yes SQL Server compiles a plan that does not take account of the actual variable value unless the statement is subject to a recompile after the variable is assigned. – Martin Smith Sep 17 '12 at 09:16
  • @MartinSmith, makes sense, it wouldn't be very reusable. – Jodrell Sep 17 '12 at 09:19
  • @MartinSmith I've added the execution plans. – Xavinou Sep 17 '12 at 12:57
  • 1
    @Jodrell - Not sure if you looked at the plans but turns out that the slow one **was** in fact using an index and that was the problem. The fast one does a scan. In the OP's case they have many rows with the value `DAV` and even though [these are present in the stats](https://www.dropbox.com/sh/jgai5f9txbs84x6/ZfJdjxbf6w/stats_index_a.png) the statistics estimation for the `LIKE` seems to fail to realise these values will match the predicate. – Martin Smith Sep 18 '12 at 07:15