4

I'm having this argument about using Cursors in TSQL recently...

First of all, I'm not a cheerleader in the debate. But every time someone says cursor, there's always some knucklehead (or 50) who pounce with the obligatory 'cursors are evil' mantra. I know SQL-Server was optimized for set-based operations, and maybe cursors truly ARE evil incarnate, but if I wanted to put some objective thought behind that...

Here's where my mind is going:

  1. Is the only difference between cursors and set operations one of performance?

    Edit: There's been a good case made for it not being simply a matter of performance -- such as running a single batch over-and-over for a list of id's, or alternatively, executing actual SQL text stored in a table field row-by-row.

  2. Follow-up: do cursors always perform worse?

    • EDIT: @Martin shows a good case where Cursors out-perform set-based operations fairly dramatically. I suspect that this wouldn't be the kind of thing you'd do too often (before you resorted to some kind of OLAP / Data Warehouse kind of solution), but nonetheless, seems like a case where you really couldn't live without a cursor.
    • reference to TPC benchmarks suggesting cursors may be more competitive than folks generally believe.
    • reference to memory-usage optimizations for cursors since Sql-Server 2005
  3. Are there any problems you can think of, that cursors are better suited to solve than set-based operations?

    • EDIT: Set-based operations literally cannot Execute stored procedures, etc. (see edit for item 1 above).
    • EDIT: Set-based operations are exponentially slower than row-by-row when it comes to aggregating over large data sets.

  • Article from MSDN explaining their perspective of the most common problems people resort to cursors for (and some explanation of set-based techniques that would work better.)
  • Microsoft says (vaguely) in the 2008 Transact SQL Reference on MSDN: "...there are times when the results are best processed one row at a time", but the don't give any examples as to what cases they're referring to.

Mostly, I'm of a mind to convert cursors to set-based operations in my old code if/as I do any significant upgrades to various applications, as long as there's something to be gained from it. (I tend toward laziness over purity a lot of the time -- i.e., if it ain't broke, don't fix it.)

General Grievance
  • 4,555
  • 31
  • 31
  • 45
Chains
  • 12,541
  • 8
  • 45
  • 62
  • 1
    "if it isn't broken, don't fix it." - I think you were aiming for "If it aint broke, dont fix it" :-) – Damien_The_Unbeliever Jul 29 '11 at 18:39
  • @Damien -- well see, I broke it already. :-) – Chains Jul 29 '11 at 18:40
  • I have 4 really insightful answers so far, and considering that they're all valid, I'm just going to mark the one with the most votes the answer. I summarized the answers in my original question, and I hope folks will still contribute to this if they see something that got left out. For example -- one thing I still want to test is aggregating strings in a field (i.e., creating a delimited list), and whether or not cursors outperform set-based solutions over large data sets, such as the coalesce solution at the link in question 3 above (Article from MSDN). Anyway, thank you. – Chains Aug 01 '11 at 14:56

4 Answers4

4

To answer your question directly:

I have yet to encounter a situation where set operations could not do what might otherwise be done with cursors. However, there are situations where using cursors to break a large set problem down into more manageable chunks proves a better solution for purposes of code maintainability, logging, transaction control, and the like. But I doubt there are any hard-and-fast rules to tell you what types of requirements would lead to one solution or the other -- individual databases and needs are simply far too variant.

That said, I fully concur with your "if it ain't broke, don't fix it" approach. There is little to be gained by refactoring procedural code to set operations for a procedure that is working just fine. However, it is a good rule of thumb to seek first for a set-based solution and only drop into procedural code when you must. Gut feel? If you're using cursors more than 20% of the time, you're doing something wrong.

And for what I really want to say:

When I interview programmers, I always throw them a couple of moderately complex SQL questions and ask them to explain how they'd solve them. These are problems that I know can be solved with set operations, and I'm specifically looking for candidates who are able to solve them without procedural approaches (i.e., cursors).

This is not because I believe there is anything inherently good or more performant in either approach -- different situations yield different results. Rather it's because, in my experience, programmers either get the concept of set-based operations or they do not. If they do not, they will spend too much time developing complex procedural solutions for problems that can be solved far more quickly and simply with set-based operations.

Conversely, a programmer who gets set-based operations almost never has problems implementing a procedural solution when, indeed, it's absolutely necessary.

Michael Ames
  • 2,607
  • 1
  • 16
  • 22
  • @Michael-- Is there any chance that I could get a link to those questions? For personal posterity? – RLH Jul 29 '11 at 18:48
  • I give them verbally, but would be glad to write them down for you. Give me a few days and I'll type them up and post them somewhere. :) – Michael Ames Jul 29 '11 at 18:53
  • re: what you really want to say -- That's a really good observation, actually -- I've noticed that to some extent, in working with other program analysts at my institution. – Chains Jul 29 '11 at 18:56
  • +1 -- Thank you -- also, I agree with the rule of thumb re: looking for set-based solutions before resorting to cursors. – Chains Jul 29 '11 at 22:30
  • Giving you the check-mark... In case you want to look at them, t-clausen and damien actually came up with two specific cases where you really had no set-based solution, and martin came-up with one where performance was a pretty serious issue. But anyway, I thought your answer was really insightful on a more general level, and definitely gives a lot of good wisdom / advice. Thanks. – Chains Aug 01 '11 at 15:03
3

Running Totals is the classic case where as the number of rows gets larger cursors can out perform set based operations as despite the higher fixed cost of the cursor the work required grows linearly rather than exponentially as with the set based "triangular join" approach.

Itzik Ben Gan does some comparisons here.

Graph

Denali has more complete support for the OVER clause however that should make this use redundant.

Ryan M
  • 18,333
  • 31
  • 67
  • 74
Martin Smith
  • 438,706
  • 87
  • 741
  • 845
  • Maybe I'm missing something, but a recursive CTE can also replace a cursor very well for this task, even if in the end it [cannot](http://explainextended.com/2009/11/18/sql-server-are-the-recursive-ctes-really-set-based/) be considered strictly a set-based approach. – Andriy M Jul 29 '11 at 22:53
  • @Andriy - [The comments here indicate that the recursive CTE is very slow for this](http://weblogs.sqlteam.com/mladenp/archive/2009/07/28/SQL-Server-2005-Fast-Running-Totals.aspx) not sure I've ever seen any figures on this though. – Martin Smith Jul 29 '11 at 22:57
2

Since I've seen people manage to re-implement cursors (in all there varied forms) using other TSQL constructs (usually involving at least one while loop), there's nothing that cursors can achieve that can't be done using other constructs.

That's not to say that the re-implementations aren't equally as inefficient as the cursors that were avoided by not including the word "cursor" in that solution. Some people seem to purely hate the word, not the mechanics.

One place I've successfully argued to keep cursors was for a data transfer/transform between two different databases (we were dealing with clients here). Whilst we could have implemented this transfer in a set based manner (indeed, we previously had), there was problematic data that could cause issues for a few clients. In a set based solution, we had either to:

  • Continue the transfer, excluding failed client data at each table, leaving those clients partially transferred, or,
  • abort the entire batch

Whereas, by making the unit of transfer the individual client (using a cursor to select each client), we could make each client's transfer between the systems either work fully or be entirely rolled back (i.e. place each transfer in its own transaction)

I can't think of any situations where I've wanted to use a cursor below the "top level" of such transfers though (e.g. selecting which client to transfer next)

Damien_The_Unbeliever
  • 234,701
  • 27
  • 340
  • 448
  • +1 for the example, however the OP was comparing cursors to set-based operations, so talking about reimplementing cursors with forms of looping isn't really relevant. – Davy8 Jul 29 '11 at 19:24
  • @Davy8 - the original form of question 3 asked whether there were things you could do with cursors that couldn't be done in other ways, hence the first two paragraphs. – Damien_The_Unbeliever Jul 29 '11 at 19:28
  • unless it's just now showing up in the edit history (which is possible, I've noticed really quick edits sometimes don't show up) it looks like question 3 was always about cursors vs set-based, it was just changed from asking about what was possible to asking whether there's anything cursors solve better. – Davy8 Jul 29 '11 at 21:26
  • +1 -- thank you. That's a good case (similar to @t-clausen.dk) -- which makes me revise the question now :-) to differentiate between 'sets of data' versus 'sets of execution statements'. – Chains Jul 29 '11 at 22:33
2

Often when you build dynamic sql, you have to use cursors. Imagine a script that search through all tabels in the database for same value in different fields. Best solution will be a cursor. Question where the problem was raised is here How to use EXEC or sp_executeSQL without looping in this case? I will be really impressed if anyone can solve that better without a cursor.

Community
  • 1
  • 1
t-clausen.dk
  • 43,517
  • 12
  • 59
  • 92
  • 1
    Agreed. Dynamic SQL is a whole different ballgame. – Michael Ames Jul 29 '11 at 20:18
  • +1 -- thank you. That's a good case (similar to what @Damien_The_Unbeliever is talking about) -- which makes me revise the question now :-) to differentiate between 'sets of data' versus 'sets of execution statements' – Chains Jul 29 '11 at 22:34