2

I'm looking for a solution to a thorny problem.

Me and my colleagues have made an 'engine' (in C#) that performs different elaborations on a SQL Server database.

Initially, these elaborations were contained in many stored procedures called in series in a nightly batch. It was a system with many flaws.

Now we have extracted every single query from each stored procedure and, may sound strange, we have inserted the queries into the DB.

(Note: the reasons are different and I'm not listing them all, but you just need to know that, for business reasons, we do not have the opportunity to make frequent software releases... but we have a lot of freedom with SQL scripts).

Mainly, the logic behind our engine is:

  • there are Phases, called sequentially
  • each Phase contains several Step, then subdivided into Set
  • the Set is a set of Steps, that will be executed sequentially
  • the Sets, unless otherwise specified, start running parallel to each other
  • the Step that by default does not belong to any Set, will be embedded in a Set (created at runtime)
  • a Set before starting may have to wait the completion of one or more Steps
  • Step corresponds to atomic (or almost) SQL queries or C# methods to run
  • at start the engine queries the database, then composes the Phases, Step and Set (and related configurations)... which will be executed

enter image description here

We have created the engine, we have all the configurations... and everything works.

However, we have a need: some phases must have a transaction. If even a single step of that phase fails, we need to rollback the entire phase.

What creates problems is the management of the transaction.

Initially we created a single transaction and connection for the entire phase, but we soon realized that - because of multithreading - this is not thread-safe.

In addition, after several tests, we have had exceptions regarding the transaction. Apparently, when a phase contains a LOT of steps (= many database queries), the same transaction cannot execute any further statements.

So, now, we've made a change and made sure that each step in the phases that require a transaction opens a connection and a transaction on its own, and if all goes well, all commits (otherwise rollback).

It works. However, we have noticed a limitation: the use of temporary tables.

In a transactional phase, when I create a temporary temp table (#TempTable1) in a step x, I can't use #TempTable1 in the next step y (SELECT TOP 1 1 FROM #TempTable1).

This is logical: as it is a separate transaction and #TempTable1 is deleted at the end of the execution instance.

Then we tried to use a global temp table ##TempTable2, but, in step y, the execution of the SELECT is blocked until the timeout passes..

I also tried lowering the transaction isolation level, but nothing.

Now we are in the unfortunate situation of having to create real tables instead of using temporary tables.

I'm looking for a compromise between the use of transactions on a large number of steps and the use of temporary tables. I believe that the focus of the speech is the management of transactions. Suggestions?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Gioce90
  • 554
  • 2
  • 10
  • 31
  • You could use a UID for a transaction ID which is tied to your transitional data, so that you get select stuff from transtable where UID = {mytrans} and allows for many many transactions all to be happening at the same time – BugFinder Nov 07 '17 at 13:06
  • 1
    Holy complicated... Just a thought, but have you tried using real tables (frequently referred to as "staging" or "transient" tables)? These can live permanently in the db or be transient. Really doesn't matter. The point being, they're materialized and accessible to all/any query at any point. In my production systems I typically drop them when finished to keep the schema "clean". – pim Nov 07 '17 at 13:50
  • @BugFinder I'm not sure how I can do this – Gioce90 Nov 07 '17 at 21:07
  • @pim yeah is complicated but necessary... But I don't like your approach, it seams aa little "dirty" to me. The question was "Can multithreading, transactions and temporary tables work together?" – Gioce90 Nov 07 '17 at 21:12
  • 1
    @Gioce90 It is a very common solution to many ETL workloads, so I'm not really sure where your bias is coming from. Also, I think it's incorrect to make assumptions about a solution without actually trying it. Physical tables will be written to the actual data files instead of tempdb, will pot you performance benefits and most importantly solve your problem. I would highly consider putting your personal, and unfounded, bias aside and give it shot. – pim Nov 07 '17 at 21:30
  • @Gioce90 To answer your specific question: One thing to consider with global temp tables is that they are automatically dropped when the session that created the table ends and all other tasks have stopped referencing them. This means that if there is even a microsecond delay where the table is no longer being referenced, it might be dismantled. – pim Nov 07 '17 at 21:41
  • @Gioce90 You might also want to consider physical tables in the tempdb instead of a global temp. Created like: `USE tempdb CREATE TABLE t` – pim Nov 07 '17 at 21:41
  • @pim the problem remains. The table (temp or not) still stay locked by transaction – Gioce90 Nov 10 '17 at 13:34
  • @Gioce90 if we knew more about the actual SQL statements you were executing, then perhaps we could provide a more educated opinion. For example, I've had great success in high concurrency environments with the infamous "JFDI" (just f$&@ing do it) pattern proposed for SO Legend [gbn](https://stackoverflow.com/users/27535/gbn). You can find an example [here](https://stackoverflow.com/questions/3407857/only-inserting-a-row-if-its-not-already-there), and [here](https://stackoverflow.com/questions/3593870/select-insert-version-of-an-upsert-is-there-a-design-pattern-for-high-concurr). – pim Nov 10 '17 at 14:12

0 Answers0