Can't propose simple solution, only few directions to explore.
First, Step Functions have a specific way to handle long running background work: activities. https://docs.aws.amazon.com/step-functions/latest/dg/concepts-activities.html it is basically a queue.
If you want 100% serverless, this is going to be complicated or ugly.
- either, as you said, create new step function for each file
- or, S3 poll loop in state machine using custom error code and
Retry
clause
If you can allocate "1/8 micro" instance for background worker it's not elegant but easy and can be implemented with instant reaction. Low hardware requirement hint that we're going to use machine only for synchronisation.
Define StepFunction activity, named for example video-duration
.
Define SQS queue for instant reaction or poll S3 for duration results.
State function pseudocode:
{
StartAt: ffprobe
ffprobe: {
Type: Task
Resource: arn:...lambda:launch-ffprobe
Next: wait-duration
}
wait-duration: {
Type: Task
Resource: arn...activity:video-duration
End: true
}
}
Background worker pseudocode:
statemap = dict/map filename to result
thread1:
loop:
taskToken, input = SF.GetActivityTask('video-duration') # long poll
sync(key=input.filename, waiter=taskToken)
thread2:
loop:
msg = SQS.ReceiveMessage(...) # or poll S3
sync(key=msg.filename, duration=msg.result)
function sync(key, waiter, duration):
state = statemap[key]
if waiter:
state.waiter = waiter
if duration:
state.duration = duration
if state.waiter and state.duration:
SF.SendTaskSuccess(state.waiter, state.duration)
S3 trigger pseudocode:
if filename is video:
SF.StartExecution(...)
else if filename is duration:
content = S3.GetObject(filename)
SQS.SendMessage(queue, content)