9

I have a program which does many bulk operations on an SQL Server 2005 or 2008 database (drops and creates indexes, creates columns, full table updates etc), all in one transaction.

Are there any problems to be expected?

  • I know that the transaction log expands even in Simple recovery mode.
  • This program is not executed during normal operation of the system, so locking and concurrency is not an issue.

Are there other reasons to split the transaction into smaller steps?

Stefan Steinegger
  • 63,782
  • 15
  • 129
  • 193
  • I would offer that you flip the question on it's head. Why would you want to do all this work in a single transaction? – John Sansom Jun 08 '11 at 07:16
  • 4
    Because it's simple to implement. I don't have to care about intermediate states. The whole thing is either done or not. It is not just an SQL script, it consists of a bunch of individual classes which are dynamically loaded and executed. – Stefan Steinegger Jun 09 '11 at 05:50

4 Answers4

7

In short,

  • Using smaller transactions provides more robust recovery from failure.
  • Long transactions may also unnecessarily hold locks on objects for extended periods of time that other processes may require access to i.e. blocking.

Consider that if at any point between the time the transaction started and finished, your server experienced a failure, in order to be bring the database online SQL Server would have to perform the crash recovery process which would involve rolling back all uncommitted transactions from the log.

Supposing you developed a data processing solution that is intelligent enough to pick up from where it left off. By using a single transaction this would not be an option available to you because you would need to start the process from the begging once again.

John Sansom
  • 41,005
  • 9
  • 72
  • 84
1

If the transaction causes too many database log entries (updates) the log can hit what is known as the "high water mark". It's the point at which the log reaches (about) half of its absolute maximum size, when it must then commence rolling back all updates (which will consume about the same amount of disk as it took to do the updates.

Not rolling back at this point would mean risking eventually reaching the maximum log size and still not finishing the transaction or hitting a rollback command, at which point the database is screwed because there's not enough log space to rollback.

Bohemian
  • 412,405
  • 93
  • 575
  • 722
1

It isn't really a problem until you run out of disk space, but you'll find that rollback will take a long time. I'm not saying to plan for failure of course.

However, consider the process not the transaction log as such. I'd consider separating:

  • DDL into a separate transaction
  • Bulk load staging tables with a transaction
  • Flush data from staging to final table in another transaction

If something goes wrong I'd hope that you have rollback scripts and/or a backup.

Is there really a need to do everything atomically?

gbn
  • 422,506
  • 82
  • 585
  • 676
  • Not, it is not a need to do it atomically, but it had been easiest until now. I don't need to care about rollback and restarting the whole thing. I can't separate DDL that easily, because there are many places where I add new columns or tables and migrate data from existing tables and then drop old columns or tables. – Stefan Steinegger Jun 08 '11 at 11:47
0

Depending on the complexity of your update statements, I'd recommend to do this only on small tables of, say, a few 100 rows. Especially if you have only a small amount of main memory available. Otherwise, for instance, updates on big tables can take a very long time and even appear to hang. Then it's difficult to figure out what the process (spid) is doing and how long it might take.

I'm not sure whether "Drop index" is transaction-logged operation anyway. See this question here on stackoverflow.com.

Community
  • 1
  • 1
knb
  • 9,138
  • 4
  • 58
  • 85