1

I have a data-upload function that load some data into several tables and processes them. I want my users to be able to initiate two uploads at once (though this will be the exception rather than the rule), but in that instance I want the first upload to complete before the second upload begins. (The whole process should take around 20 seconds).

The reason for this is that if two uploads put data into the tables at the same time, then they'll interfere with one-another's data. (Side-note: I originally planned to use temp tables to achieve process isolation, but found I couldn't do that).

What's the best way to ensure that upload 2 waits until upload 1 is finished? I could create a table to use as a mutex and grab a lock on that table (or a known row) inside a transaction that wraps the whole process, but is there a cleaner or more efficient way to do it?

Community
  • 1
  • 1
Gary McGill
  • 26,400
  • 25
  • 118
  • 202
  • 2
    FFS. I originally titled this question "what is the best way to..." rather than "how can I..." and got the message that the question was likely to be closed as "subjective". How the heck are we supposed to get advice on the best way to do something? Isn't that what the site is for? – Gary McGill Mar 05 '10 at 13:10
  • "what is the best way..." is subjective that one way is better than all the rest. "how can i..." doesn't suppose that any of the solutions are better than another. you just have to trust that the "best way" will get more up votes than any other. –  Mar 05 '10 at 14:07

4 Answers4

1

I would recommend putting each upload in it's own transaction. This way they will not intererfere.

schoetbi
  • 12,009
  • 10
  • 54
  • 72
1

Make a serialisable sproc that issues a unique identifier and tag the uploads with that ID value. Then they can go into the same table. The uploader can then initiate a batch process tagged with that ID to merge the data into the destination.

ConcernedOfTunbridgeWells
  • 64,444
  • 15
  • 143
  • 197
  • I'm not quite sure what you mean by a "serialisable" sproc? However, the technique you describe of having an ID column so that each upload can use a different ID is exactly where I started. Then I realised that because I'm using BULK INSERT to load the data, I can't actually set the ID on the newly-added records as they're created - and so I can't make sure that they're tagged with the correct ID. Unless you know of a way to do that? – Gary McGill Mar 05 '10 at 14:46
  • Serialisable - the sproc sets the transaction isolation level to serialisable for the duration of the transaction that generates and issues the upload batch ID. It begins the transaction, gets the ID (perhaps inserting to a table with an identity column and reading @@identity), commits the transaction and returns. This gives you a nice, atomic, monotonically increasing batch ID that you can record status information against later. – ConcernedOfTunbridgeWells Mar 05 '10 at 17:44
  • Sorry, perhaps I'm being dense, but coming up with a batch ID is not a problem (and nor is it a solution to my actual problem). Unless I can do my BULK INSERT in such a way that all the new records are tagged with the batch ID as soon as they're created, I'm no further forward. I'd have to wrap *everything* inside a transaction, not just the creation of the ID? Which is pretty much what everyone else is suggesting. – Gary McGill Mar 05 '10 at 17:59
  • The batch ID can be used to separate uploads, which allows concurrent inserts without causing any problems differentiating the batches from each other. The upload function populates the table and then initiates processing by batch ID. You can use SQLBulkLoad (IIRC) with .net if you need to force the batch ID, or some other means to tag it onto the rows. Wrapping the inserts into a transaction with isolation at serializable or read committed and a table lock would allow you to order the transactions but this is not necessary if you can identify the batches in some other way. – ConcernedOfTunbridgeWells Mar 05 '10 at 21:07
  • I've realised how I can get this to work. You tried several times to get me to understand how and why to use a batch ID; I tried several times to get you to understand that I already understood that bit(!). What I was missing was that I *don't* need a transaction around my whole upload process (which is what I was trying to avoid). I only need to transactionalize the very small 2-step process of BULK INSERT-ing the data and then setting the batch ID. (My batch ID field needs to be nullable for this to work with BULK INSERT). Thanks for your help. – Gary McGill Mar 08 '10 at 13:39
1

Transactions are supposed to provide isolation. Can you explain why they would not for your scenario?

Kent Boogaart
  • 175,602
  • 35
  • 392
  • 393
  • If I wrap the whole thing in a transaction, then *everything* I do will be transactionalized, and so every table I touch during the upload process will be blocked to some extent. I'm only interested in blocking access to the particular tables used during the upload, and I'd rather that all the other tables were not affected. So yes, this would provide isolation - but I'm hoping that there might be another way that would have less of an overhead. – Gary McGill Mar 05 '10 at 14:43
1

In all your upload routines request exclusive TABLOCK on each affected table. That way the second upload will wait until existing TABLOCK is released.

GSerg
  • 76,472
  • 17
  • 159
  • 346
  • Can you give me a short example of how I should do that? – Gary McGill Mar 05 '10 at 14:47
  • @Gary McGill, `DECLARE @Count int; BEGIN TRANSACTION; SELECT @Count=COUNT(*) FROM YourTable WITH (tablock); /*your processing here on YourTable*/ COMMIT;` – KM. Mar 05 '10 at 15:15