7

I've been working on Coldfusion/MS SQL for years and this is one of the strangest problems I've seen. The problem itself has since been resolved but I don't really understand what happened; this question is attempting to get a clear idea of the likely cause.

The Problem

In a stable production environment, for no apparent reason, one query starts returning in around 1,000-1,500 ms (about 10 times slower than usual). I was able to isolate it to this:

<cfquery datasource="#ds#" name="query">
   select 1
   from eLearning.v_courseCompletion cc
   where 
      cc.memberIncId = <cfqueryparam value="3" cfsqltype="cf_sql_integer"> and
      cc.courseId = <cfqueryparam value="25" cfsqltype="cf_sql_integer"> and 
      cc.currentCourseCompleted = 1
</cfquery>

What's strange is, this behaviour is aggravated when within a loop, even when there's a single iteration, like in this example:

<cfloop from="1" to="1" index="i">
   <cfquery datasource="#ds#" name="query">
      select 1
      from eLearning.v_courseCompletion cc
      where 
         cc.memberIncId = <cfqueryparam value="3" cfsqltype="cf_sql_integer"> and
         cc.courseId = <cfqueryparam value="25" cfsqltype="cf_sql_integer"> and 
         cc.currentCourseCompleted = 1
   </cfquery>
</cfloop>

This should be exactly the same as above, right? The loop should have no effect but instead, this test runs about 10 times slower, returning in between 7,000-16,000 ms. This is how the problem was detected; the query (buried within a object method) was being called from the body of a loop, if the loop iterated more than 5 or 6 times the request would timeout.

To me, this indicated a problem on the Coldfusion side but restarting the service, or indeed the machine, did nothing.

Meanwhile, once isolated, I noticed that making any change to the query itself caused the performance to return to the expected level, around 150-190 ms. For example:

  • Changing the selected fields (ie. select *)
  • Removing the table alias (cc)
  • Replacing either <cfqueryparam> with an inline value
  • Removing any of the conditions

Any of these changes "fixed" the problem but, when running the original query, the performance problem would return.

The Solution

At this point I guessed the query's execution plan had been corrupted or something, did some Googling, and ran DBCC FREEPROCCACHE against the DB server. This fixed the problem immediately. Great, problem solved....

The Question

Since then though, I've done a bit more research and the consensus seems to be that execution plans "don't get corrupted". There's a some talk of similar problems occurring with stored procedures and parameter sniffing but I'm not using any sp's here. We are selecting from a fairly complicated view though (eLearning.v_courseCompletion) with nested joins. Is that the issue?

Basically, what actually happened here? How to I stop it from happening again?

.. and what the hell is the connection to loops in CF?!?

Versions

  • Coldfusion 9.0.2.282541 (64 bit)
  • SQL Server Express 10.50.4297 (64 bit)
  • Both servers are Win Server 2008 R2 Datacenter (64 bit)
Community
  • 1
  • 1
Molomby
  • 5,859
  • 2
  • 34
  • 27

1 Answers1

6

You are using a stored procedure under the hood when you use a cfqueryparam. When you don't use a cfqueryparam, your query is simply sent in as a "free text" batch query. When you use a cfqueryparam, you are sending your query in to be executed using sp_executeSQL() which itself will send your query body in as a parameter. This allows the query plan to cache. If it sees the same query, it will use the stats it has saved for that particular plan. This means, that if it ran with some really oddball data and got a bad idea for executing the query, the next iterations would all use the same plan, which is a "bad plan" for 99% of the use cases of this query, but maybe a good plan for that one oddball instance.

Each query that is executed with sp_execute SQL also returns a numeric handle that the JDBC driver can use to simply tell SQL which plan it can use, basically a shortcut. This is called "max pooled statements" in your DSN settings in CFadmin. Having that set to 0 or 1000 does not affect the fact that you will be taking advantage of a plan cache with sp_executeSQL.

http://blogs.msdn.com/b/turgays/archive/2013/09/18/exec-vs-sp-executesql.aspx

StackOverflow had a good demonstration of this, if one particular power user would load their account page with their millions of badges and points before the query stats were built, it would mess up the stats for every other user who has but a few hundred or so points and a handful of badges, making the page slow for him or her.

J.T.
  • 2,606
  • 15
  • 31
  • Ah fantastic, that makes a lot of sense. Any insight into why running it from within a loop (even 1 iteration) would impact performance so much? Thanks – Molomby Oct 10 '14 at 05:27
  • No, the loop doesn't sound like the culprit. I would blame some other circumstance that the loop provides, such as the index variable or something which is making the loop seem like it is to blame. Also, you can almost always get away with not running cfquery in a loop. There are very few times when this actually makes sense. – J.T. Oct 10 '14 at 12:23
  • Thanks for the follow up but I assure you there's nothing else in the loop that could (should?) be causing this; the code blocks posted above are straight out of my test cases (plus a `getTickCount()` before and after). The loop's index (or item) variable isn't referenced at all by the query and there's nothing else in the loop body. Presumably something's going haywire during the loops compilation to Java? If I can reproduce the issue I'll dig a bit deeper. – Molomby Oct 13 '14 at 00:23
  • Sorry man, I think you can probably isolate it, but I've never heard of this. Also, if you are interested in seeing an alternative to running your query in a cfloop, post a question and someone will hook you up. – J.T. Oct 13 '14 at 12:39
  • Optimising this specific case by refactoring out the loop wouldn't be hard (but it'll break the abstraction of the ORM). I'm more concerned with preventing this category of problem if possible. Thanks though – Molomby Oct 15 '14 at 00:34
  • Ah, I see. Yeah, I'm not sure what to tell, you've reached the end of my knowledge. You could always decompile the class file it generates to look for something funky. – J.T. Oct 15 '14 at 12:35