-1

The best way to tackle reading from a single table in SQL Server using multiple threads and make sure not reading the same record twice in different thread using c#

Thank you for your help in advance

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
user1431818
  • 1
  • 1
  • 1
  • Why must you ensure that the threads aren't reading the same records? For a performance reason or for a business reason? @Despertar is correct in that we need more details to help. – bluevector Jun 02 '12 at 01:26

4 Answers4

3

Are you trying to read records from the table in parallel to speed up retreiving the data or are you just worried about data corruption with threads accessing the same data?

Database Management Systems like MsSQL handle concurrency extremely well so thread safety in that respect is not something you would have to be concerned with in your code if you have mutiple threads reading the same table.

If you want to read data in parallel without any overlapping you could run a SQL command with paging, and just have each thread fetch a different page. You could have say 20 threads all read 20 different pages at once and it would be guaranteed that they are not reading the same rows. Then you can concatenate the data. The greater the page size the more performance boost you would get from creating the thread.

efficient way to implement paging

Community
  • 1
  • 1
Despertar
  • 21,627
  • 11
  • 81
  • 79
0

Assuming a dependency on SQL Server, you could possibly looking at the SQL Server Service Broker features to provide queuing for you. One thing to keep in mind with that is that currently SQL Server Service Broker isn't available on SQL Azure, so if you had plans on moving onto the Azure cloud that could be a problem.

Anyway - with SQL Server Service Broker the concurrent access is managed at the database engine layer. Another way of doing it is having one thread that reads the database and then dispatches threads with the message as the input. That is slightly easier than trying to use transactions in the database to ensure that messages aren't read twice.

Like I said though, SQL Server Service Broker is probably the way to go. Or a proper external queuing mechanism.

Mitch Denny
  • 2,095
  • 13
  • 19
0

Solution 1:
I am assuming that you are attempting to process or extract data from a large table. If I were assigned this task I would first look at paging . If you are trying to split work among threads that is. So Thread 1 handles pages 0 to 10, Thread 2 handles pages 11 to 20, etc... or you could batch rows using the actual rownumber. So in your stored proc you would do this;

WITH result_set AS (
  SELECT
    ROW_NUMBER() OVER (ORDER BY <ordering>) AS [row_number],
    x, y, z
  FROM
    table
  WHERE
    <search-clauses>
) SELECT
  *
FROM
  result_set
WHERE
  [row_number] BETWEEN @IN_Thread_Row_Start AND @IN_Thread_Row_End;

Another choice which would be more efficient is if you have a natural key, or a darn good surrogate, is to page using that and have the thread pass in the key parameters rather than the records it is interested in ( or page numbers ).

Immediate concerns with this solution would be:

  1. ROW_NUMBER performance
  2. CTE Performance (I believe they are stored in memory)

So if this was my problem to resolve I would look at paging via a key.

Solution 2:
The second solution would be to mark the rows as they are processing, virtually locking them, that is if you have data writer permission. So your table would have a field called Processed or Locked, as the rows are selected by your thread, they are updated as Locked = 1;

Then your select from other threads selects only rows that aren't locked. When your process is done and all rows are processed you could reset the lock.

Hard to say what will perform best w.o some trials... GL

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
virtualadrian
  • 4,688
  • 4
  • 30
  • 22
0

This question is super old but still very relevant and I spent a lot of time finding this solution so i thought id post it for anyone else who happens along this. This is very common when using a sql table as a queue rather than msmq.

The solution (after a lot of investigation) is simple and can be tested by opening 2 tabs in ssms with each tab running its own transaction to simulate multiple processes/threads hitting the same table.

The quick answer is this: the key to this is using updlock and readpast hints on your selects.

To illustrate the reads working without duplication check out this simple example.

--on tab 1 in ssms

begin tran
SELECT TOP 1 ordno FROM table_queue WITH (updlock, readpast) 

--on tab 2 in ssms

begin tran
SELECT TOP 1 ordno FROM table_queue WITH (updlock, readpast) 

You will notice that the first selected record is locked and does not get duplicated by the select statement firing on the second tab/process.

Now in the real world you wouldnt just execute a select on your table like the simple example above. You would update your records as "isprocessing=1" or something similar if you are using your table as a queue. The above code just demonstrates that this allows concurrent reads without duplication.

So in the real world (if you are using your table as a queue and processing this queue with multiple services for instance) you would execute your select in a subquery to an update statement most likely.

Something like this.

begin tran
    update table_queue set processing= 1 where myId in
    (
        SELECT TOP 50  myId FROM table_queue WITH (updlock, readpast)
    )

commit tran

You may also combine yoru update statement with an output keyword so you have a list of all ids that are now locked (processing=1) so you can work with them.

if you are processing data using a table as queue this will ensure you will not duplicate records in your select statements without any need for paging or anything else.

This solution is being tested in an enterprise level application where we experienced a lot of duplication in our select statements when being monitored by many services running on many different boxes.

Matt
  • 1,168
  • 2
  • 15
  • 24