4

I'm designing an internal company web site where users can submit jobs for computation. An important factor in my design is to persist the job in the queue until it's completed, even if there is a system failure.

It seems the internet is against the idea as it's "not really the purpose of a database" and better suited for a key/value store like Redis (or a job queue that makes use of Redis, like Kue for Node.js). I think I get it in the sense that the purpose of this design is to not overburden the database with read/writes for fairly transient data as you would find in a job queue. In my use case though database use would be pretty low and it seems the persistence of data that a database offers is the key feature I'm looking for here.

In my reading I've found that some key/value stores, like Redis, have a persist function but it's not really built to make sure all data is recoverable in case the system goes down.

Am I missing something here or does this sound about right?

Leif
  • 325
  • 3
  • 14
  • I would have to say that I agree with the internet in saying that this is better left for a "lighter" data storage system like `redis` or even maybe `mongo`. My questions would be A.) does the company use the DB already, and how often? and B.) If you are using it, how much of an impact will you have on its read/write times? – Derek Pollard Aug 18 '16 at 05:12
  • I would also recommend `rabbitmq`. – sobolevn Aug 18 '16 at 05:12
  • I'm using mongo for my database, and no, so far it's just my application making use of the database. Not sure how much of an impact on read/write times I would make yet. – Leif Aug 18 '16 at 05:19

1 Answers1

1

In my reading I've found that some key/value stores, like Redis, have a persist function but it's not really built to make sure all data is recoverable in case the system goes down.

In Redis, data is persisted from memory to disk using a background thread. Actually a system down won't corrupt your database, but the issue is that a system can go down before or during a snapshot to disk and you'll lose any data that was created after the last successful snapshot.

If you can be sure that your Redis server will be up and running 99.9% of the time, this isn't a big problem, but anyway, the problem exists.

At the end of the day, my best advise is you should use the right tool for the job: a generalist database, either NoSQL or SQL, isn't made for job enqueueing. Use an existing tool that it does this already like RabbitMQ.

Matías Fidemraizer
  • 63,804
  • 18
  • 124
  • 206
  • I'm still a little uneasy about the prospect of losing ANY data, even if the server is stable and the chance is slim. People in my industry can be quite irritable when things don't go as planned. Would another option be to store the job queue information in a separate database (even on a remote server) so the "good" database with persistent information that counts long-term isn't getting burdened by the job queue? – Leif Aug 18 '16 at 22:06
  • @Leif No, the solution is using a reliable message queueing. I've pointed you out to one: RabbitMQ. If you're in Azure, Azure Service Bus is very powerful too. – Matías Fidemraizer Aug 18 '16 at 22:54