2

Or rather, how do I guarantee that a job won't start running if an equivalent job is already running? Basically, I have a bunch of different tasks that I need completed, and sometimes the same task will be requested more than once, but it only needs to be completed once*. How do I implement this in AWS? I've tried SWF, but couldn't guarantee that activity id's were unique among disparate workflows. I'm looking at SQS, but I see no guarantees about unique message id's, nor any way to request a list of all message id's currently in the queue.

Any help would be greatly appreciated

*unless it's called again in the future, because the job might change over time, but that's neither here nor there

  • Check some of the answers to this question: http://stackoverflow.com/questions/13484845/what-is-a-good-practice-to-achieve-the-exactly-once-delivery-behavior-with-ama – Mark B Sep 04 '15 at 05:13
  • 1
    That's a different issue. I'm not trying to figure out how to guarantee that messages in the queue are sent exactly once. I'm trying to prevent the queue from accepting messages that are duplicates of messages already in the queue. – user3784712 Sep 11 '15 at 19:33

2 Answers2

0

the answer depends on how do you handle the situation that the job starts and it fails. (i.e. how long are you willing to wait until declaring the worker dead and restarting the job and most important how do you ensure that if the worker is dead it won't make any more progress / it will abort the job?)

Ideally you would split the work you are doing in chunks are you would ensure that if the same chunk is done twice nothing bad happens. Exactly once execution is going to be very hard to do (impossible for all edge cases) because you have the issue of executing the work and failing to report in which case you don't have a way of knowing you did do the work.

Mircea
  • 10,216
  • 2
  • 30
  • 46
  • Multiple executions of the same job won't hurt anything, so it doesn't HAVE to happen exactly once, but we want to eliminate as much of the redundant work as possible. We abort jobs that time out, it's something like three minutes, though most jobs are a lot faster than that. – user3784712 Sep 04 '15 at 21:37
  • if you can tolerate multiple execution of the same job, you could simply use SQS. queue up the thing that need to be done, and have workers just pull from the queue. With SQS you have to acknowledge that you've completed the message you've gotten from the queue, so if the worker dies and/or takes to long the message will simply get resurfaced to the from tot the queue. – Mircea Sep 05 '15 at 23:41
  • But how does that prevent duplicate messages? If I send the same message twice, then there'll be two copies of the same message in the queue, and then workers will pick the job out of the queue twice – user3784712 Sep 10 '15 at 06:14
0

One possible solution would be to use a redis server (which is provided by AWS as a service with Elasticache) for implementing a distributed lock. Redis is single threaded which makes it a very good candidate for such a job. You have many details and implementation examples of distributed locking on redis website

Liviu Costea
  • 3,624
  • 14
  • 17
  • Would that require adding a lot of extra overhead? – user3784712 Sep 05 '15 at 00:40
  • On the infrastructure side not at all because Redis is offered as a service by AWS and you can have multiple slaves for high availability. In terms of libraries, there are already implementations done, some very simple with small possibilities of failure, some more complex and better. In terms of speed I think it will be pretty fast because there isn't something big to process and Redis keeps everything in RAM. And now you need to decide if it is worth it and to which level :) – Liviu Costea Sep 05 '15 at 14:43