55

I want to use a database table as a queue. I want to insert in it and take elements from it in the inserted order (FIFO). My main consideration is performance because I have thousands of these transactions each second. So I want to use a SQL query that gives me the first element without searching the whole table. I do not remove a row when I read it. Does SELECT TOP 1 ..... help here? Should I use any special indexes?

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Shayan
  • 2,758
  • 7
  • 36
  • 55
  • 2
    See this article for a good description of how to implement a queue in SQL Server: http://www.mssqltips.com/sqlservertip/1257/processing-data-queues-in-sql-server-with-readpast-and-updlock/ – Matthew Murdoch Mar 29 '13 at 08:10
  • 1
    using row-based logic such as processing a queue in sql server is a massive misuse of resources.. use sql for set-based logic – Erik Bergstedt Dec 14 '16 at 12:56

9 Answers9

48

I'd use an IDENTITY field as the primary key to provide the uniquely incrementing ID for each queued item, and stick a clustered index on it. This would represent the order in which the items were queued.

To keep the items in the queue table while you process them, you'd need a "status" field to indicate the current status of a particular item (e.g. 0=waiting, 1=being processed, 2=processed). This is needed to prevent an item be processed twice.

When processing items in the queue, you'd need to find the next item in the table NOT currently being processed. This would need to be in such a way so as to prevent multiple processes picking up the same item to process at the same time as demonstrated below. Note the table hints UPDLOCK and READPAST which you should be aware of when implementing queues.

e.g. within a sproc, something like this:

DECLARE @NextID INTEGER

BEGIN TRANSACTION

-- Find the next queued item that is waiting to be processed
SELECT TOP 1 @NextID = ID
FROM MyQueueTable WITH (UPDLOCK, READPAST)
WHERE StateField = 0
ORDER BY ID ASC

-- if we've found one, mark it as being processed
IF @NextId IS NOT NULL
    UPDATE MyQueueTable SET Status = 1 WHERE ID = @NextId

COMMIT TRANSACTION

-- If we've got an item from the queue, return to whatever is going to process it
IF @NextId IS NOT NULL
    SELECT * FROM MyQueueTable WHERE ID = @NextID

If processing an item fails, do you want to be able to try it again later? If so, you'll need to either reset the status back to 0 or something. That will require more thought.

Alternatively, don't use a database table as a queue, but something like MSMQ - just thought I'd throw that in the mix!

AdaTheDev
  • 142,592
  • 28
  • 206
  • 200
  • Why should I seperate select id from select *? – Shayan Feb 01 '10 at 16:07
  • You don't have to, you could load all the values that you need into variables at the same time as the first SELECT, and then return them at the end. Also, I've done "SELECT *" for simplicity - just return the fields you actually need. – AdaTheDev Feb 01 '10 at 16:12
  • I'd like to keep processes field in a different table with foreign key to this table to minimize locking effect of different parts of program. Does this method help? What kind of index should I use for it? – Shayan Feb 01 '10 at 16:15
  • 2
    You could use the queue table as just a mechanism for queueing, and store more detail on the specifics of what to process in a related table away from the central queue table. That approach can work nicely especially if the fields you split out are to be updated during processing. Can also be nice if you have different types (schema) of messages in the queue. – AdaTheDev Feb 01 '10 at 16:22
9

If you do not remove your processed rows, then you are going to need some sort of flag that indicates that a row has already been processed.

Put an index on that flag, and on the column you are going to order by.

Partition your table over that flag, so the dequeued transactions are not clogging up your queries.

If you would really get 1.000 messages every second, that would result in 86.400.000 rows a day. You might want to think of some way to clean up old rows.

Peter Lang
  • 54,264
  • 27
  • 148
  • 161
  • By `flag` I mean some column to remember, if a row has already been processed by your client. – Peter Lang Feb 01 '10 at 15:51
  • I believe he meant that you can add a column to your tables - maybe Dequeued - that will hold the status of each transaction. Since you are not deleting the rows once you dequeue them, you should have a way to know what transactions to ignore. You can have this be a bit field, with 0 for queued and 1 for dequeued. – Waleed Al-Balooshi Feb 01 '10 at 15:54
  • ... and then partition the table over that field, so the dequeued transactions are not clogging up your queries. – David Schmitt Feb 01 '10 at 16:23
  • @David Schmitt: I put your words into my answer as I found no better ones. Hope you don't mind... – Peter Lang Feb 01 '10 at 16:44
  • "The question what is a flag" is all about the context. In the context of relationship database design, "flag" is a four-letter word. – Craig Tullis Jul 02 '19 at 01:56
6

Everything depends on your database engine/implementation.

For me simple queues on tables with following columns:

id / task / priority / date_added

usually works.

I used priority and task to group tasks and in case of doubled task i choosed the one with bigger priority.

And don't worry - for modern databases "thousands" is nothing special.

bluszcz
  • 4,054
  • 4
  • 33
  • 52
3

This will not be any trouble at all as long as you use something to keep track of the datetime of the insert. See here for the mysql options. The question is whether you only ever need the absolute most recently submitted item or whether you need to iterate. If you need to iterate, then what you need to do is grab a chunk with an ORDER BY statement, loop through, and remember the last datetime so that you can use that when you grab your next chunk.

David Berger
  • 12,385
  • 6
  • 38
  • 51
2

perhaps adding a LIMIT=1 to your select statement would help ... forcing the return after a single match...

Reed Debaets
  • 482
  • 1
  • 6
  • 18
  • What's the difference with TOP 1? – Shayan Feb 01 '10 at 15:51
  • I know that SQL Server can use the TOP 1 is the same thing as LIMIT 1 in postgres. I imagine all the other vendors would accept one or the other. – Matt Feb 01 '10 at 15:55
  • 1
    I'll be honest, I didn't realize they were equivalent to the same thing ... I've never use the TOP syntax, only the LIMIT ... this is why I love StackOverflow: Even in providing an answer, I learn something new. – Reed Debaets Feb 01 '10 at 16:34
2

Create a clustered index over a date (or autoincrement) column. This will keep the rows in the table roughly in index order and allow fast index-based access when you ORDER BY the indexed column. Using TOP X (or LIMIT X, depending on your RDMBS) will then only retrieve the first x items from the index.

Performance warning: you should always review the execution plans of your queries (on real data) to verify that the optimizer doesn't do unexpected things. Also try to benchmark your queries (again on real data) to be able to make informed decisions.

David Schmitt
  • 58,259
  • 26
  • 121
  • 165
2

Since you don't delete the records from the table, you need to have a composite index on (processed, id), where processed is the column that indicates if the current record had been processed.

The best thing would be creating a partitioned table for your records and make the PROCESSED field the partitioning key. This way, you can keep three or more local indexes.

However, if you always process the records in id order, and have only two states, updating the record would mean just taking the record from the first leaf of the index and appending it to the last leaf

The currently processed record would always have the least id of all unprocessed records and the greatest id of all processed records.

Quassnoi
  • 413,100
  • 91
  • 616
  • 614
  • I'd like to keep processes field in a different table with foreign key to this table to minimize locking effect of different parts of program. – Shayan Feb 01 '10 at 16:04
  • 4
    `@Shayan`: this will severely impact your select performance. And you need to lock the field while processing anyway. – Quassnoi Feb 01 '10 at 16:18
1

I had the same general question of "how do I turn a table into a queue" and couldn't find the answer I wanted anywhere.

Here is what I came up with for Node/SQLite/better-sqlite3. Basically just modify the inner WHERE and ORDER BY clauses for your use case.

module.exports.pickBatchInstructions = (db, batchSize) => {
  const buf = crypto.randomBytes(8); // Create a unique batch identifier

  const q_pickBatch = `
    UPDATE
      instructions
    SET
      status = '${status.INSTRUCTION_INPROGRESS}',  
      run_id = '${buf.toString("hex")}',
      mdate = datetime(datetime(), 'localtime')
    WHERE
      id IN (SELECT id 
        FROM instructions 
        WHERE 
          status is not '${status.INSTRUCTION_COMPLETE}'
          and run_id is null
        ORDER BY
          length(targetpath), id
        LIMIT ${batchSize});
  `;
  db.run(q_pickBatch); // Change the status and set the run id

  const q_getInstructions = `
    SELECT
      *
    FROM
      instructions
    WHERE
      run_id = '${buf.toString("hex")}'
  `;
  const rows = db.all(q_getInstructions); // Get all rows with this batch id

  return rows;
};
Daniel Kaplan
  • 222
  • 1
  • 8
0

A very easy solution for this in order not to have transactions, locks etc is to use the change tracking mechanisms (not data capture). It utilizes versioning for each added/updated/removed row so you can track what changes happened after a specific version.

So, you persist the last version and query the new changes.

If a query fails, you can always go back and query data from the last version. Also, if you want to not get all changes with one query, you can get top n order by last version and store the greatest version I'd you have got to query again.

See this for example Using Change Tracking in SQL Server 2008

George Mavritsakis
  • 6,829
  • 2
  • 35
  • 42
  • How does change tracking help you use a database table as a queue? In a queue, you want to get the next available task (in FIFO order) which has not already been processed, and ensure that item only gets processed once. Change tracking solves an entirely different problem-- which rows of a table have changed since I last queried. I'm not seeing the connection. – Brian Rogers Nov 17 '12 at 01:32
  • Good point Brian and you are right. I proposed change tracking so that table queues would not be needed at all. That was my point. Instead of using triggers (possibly) or something else to fill the queue, someone could use the change tracking mechanisms to get changes right from the source tables, as long as, he want to track changes ..... Thanks for the comment. – George Mavritsakis Nov 18 '12 at 09:30