4

We have a site in development that when we deployed it to the client's production server, we started getting query timeouts after a couple of hours.

This was with a single user testing it and on our server (which is identical in terms of Sql Server version number - 2005 SP3) we have never had the same problem.

One of our senior developers had come across similar behaviour in a previous job and he ran a query to manually update the statistics and the problem magically went away - the query returned in a few miliseconds.

A couple of hours later, the same problem occurred.So we again manually updated the statistics and again, the problem went away. We've checked the database properties and sure enough, auto update statistics isTRUE.

As a temporary measure, we've set a task to update stats periodically, but clearly, this isn't a good solution.

The developer who experienced this problem before is certain it's an environment problem - when it occurred for him previously, it went away of its own accord after a few days.

We have examined the SQL server installation on their db server and it's not what I would regard as normal. Although they have SQL 2005 installed (and not 2008) there's an empty "100" folder in installation directory. There is also MSQL.1, MSQL.2, MSQL.3 and MSQL.4 (which is where the executables and data are actually stored).

If anybody has any ideas we'd be very grateful - I'm of the opinion that rather than the statistics failing to update, they are somehow becoming corrupt.

Many thanks

Tony

Ed.
  • 1,934
  • 15
  • 13
tony.wiredin
  • 617
  • 8
  • 21
  • 1
    I have just accepted answers on a little over half my questions - the ones I haven't done this for were ones that I didn't get a helpful answer on - thanks for pointing this out – tony.wiredin Aug 17 '10 at 18:03
  • 2
    This is development/coding so belongs here on SO, not ServerFault. It's fixed by code too. – gbn Aug 17 '10 at 18:50

2 Answers2

5

Disagreeing with Remus...

Parameter sniffing allows SQL Server to guess the optimal plan for a wide range of input values. Some times, it's wrong and the plan is bad because of an atypical value or a poorly chosen default.

I used to be able to demonstrate this on demand by changing a default between 0 and NULL: plan and performance changed dramatically.

A statistics update will invalidate the plan. The query will thus be compiled and cached when next used

The workarounds are one of these follows:

  • parameter masking
  • use OPTIMISE FOR UNKNOWN hint
  • duplicate "default"

See these SO questions

Now, Remus works for the SQL Server development team. However, this phenomenon is well documented by Microsoft on their own website so blaming developers is unfair

Community
  • 1
  • 1
gbn
  • 422,506
  • 82
  • 585
  • 676
  • The queries are not bad queries - the where clauses can use indexes. And an obvious point to note is that this doesn't occur in the staging environment. – tony.wiredin Aug 17 '10 at 19:00
  • thanks. It's way past our bedtime here in the UK (ok our home time) so we'll be looking at your fixes in the morning now – tony.wiredin Aug 17 '10 at 20:39
4

Is not that the statistics are outdated. What happens when you update statistics all plans get invalidated and some bad cached plan gets evicted. Things run smooth until a bad plan gets again cached and causes slow execution.

The real question is why do you get bad plans to start with? We can get into lengthy technical and philosophical arguments whether a query processor shoudl create a bad plan to start with, but the thing is that, when applications are written in a certain way, bad plans can happen. The typical example is having a where clause like (@somevaribale is null) or (somefield= @somevariable). Ultimately 99% of the bad plans can be traced to developers writing queries that have C style procedural expectation instead of sound, set based, relational processing.

What you need to do now is to identify the bad queries. Is really easy, just check sys.dm_exec_query_stats, the bad queries will stand out in terms of total_elapsed_time and total_logical_reads. Once you identified the bad plan, you can take corrective measures which depend from query to query.

Remus Rusanu
  • 288,378
  • 40
  • 442
  • 569
  • @Remus Rsanu: Why this "(@somevaribale is null) or (somefield= @somevariable). " cause problems ? – Damian Leszczyński - Vash Aug 17 '10 at 18:22
  • 2
    @Vash, that can't use an index, so it table scans. combine that with user transactions on that table any you have blocking and waiting. read [Dynamic Search Conditions in T-SQL by Erland Sommarskog](http://www.sommarskog.se/dyn-search.html), here is the part on ["(@somevaribale is null) or (somefield= @somevariable). "](http://www.sommarskog.se/dyn-search-2005.html#OR_ISNULL) – KM. Aug 17 '10 at 18:26
  • 1
    -1 Parameter sniffing can be caused by a poor choice of default or an atypical value. Not all developers are muppets at SQL. I used to be able to demo parameter sniffing on demand. Now, if course you're on the SQL Server development team and arguably you could be defending your product by blaming developers. Bad plans caused by parameters sniffing *happen* in real life – gbn Aug 17 '10 at 18:38
  • The queries are not bad queries - the where clauses can use indexes. And an obvious point to note is that this doesn't occur in the staging environment. – tony.wiredin Aug 17 '10 at 19:01
  • 1
    What is important is that the problem queries can be identified and once identified they can be fixed. – Remus Rusanu Aug 18 '10 at 00:58
  • 3
    @Remus Rusanu: yes, via masking/UNKNOWN etc to fix a known issue. Some bad plans will be poor SQL skills but not "99%" as you claim – gbn Aug 18 '10 at 05:57