0

I want to process CSV file when it is uploaded in blob storage. For this requirement I am writing Web Job with blob trigger.

To make sure continuous CSV processing, I am writing one more web job with blob trigger.

So if one web job fails another web job will process the csv.

Now, my problem is when both the web jobs are running they are processing the same CSV file and end up creating the duplicate data.

How I lock the file so only one web job will process the CSV file?

Or

How can I trigger second web job if first web job is going to shut down?

Ashutosh B Bodake
  • 1,304
  • 1
  • 19
  • 31
  • What about proper exception handling so you won't need this in case of an exception thrown from your code? Otherwise maybe this can be of help: https://github.com/Azure/azure-webjobs-sdk-extensions#errortrigger? Also take a look at this answer https://stackoverflow.com/questions/35166010/azure-triggered-webjob-detecting-when-webjob-stops – Peter Bons May 22 '17 at 17:37
  • What about just scaling your job? I am pretty confident that blob are lock while processing it. – Thomas May 22 '17 at 20:11
  • 1
    @Thomas. Blob don't lock the file. While they(two web jobs) got the new blob in container, Blob Trigger for both the web jobs starts executing. – Ashutosh B Bodake May 23 '17 at 06:01

2 Answers2

2

How can I trigger second web job if first web job is going to shut down?

I suggest you use try-catch to handle the exception in your first WebJob. If any exception occurs, we could write the blob name to queue to trigger the other WebJob.

public static void ProcessCSVFile([BlobTrigger("input/{blobname}.csv")] TextReader input, [Queue("myqueue")] out string outputBlobName, string blobname)
{
    try
    {
        //process the csv file

        //if none exception occurs, set the value of outputBlobName to null
        outputBlobName = null;
    }
    catch
    {
        //add the blob name to a queue and another function named RepeatProcessCSVFile will be triggered.
        outputBlobName = blobname;
    }
}

We could create a QueueTrigger function in the other WebJob. In this function, we could read out the blob name and re-process the csv. If a new exception occurs, we also could re-add the blob name to the queue and this function will be executed again and again until the csv file has been processed successfully.

public static void RepeatProcessCSVFile([QueueTrigger("myqueue")] string blobName, [Queue("myqueue")] out string outputBlobName)
{
    try
    {
        //process the csv file

        //if none exception occurs, set the value of outputBlobName to null.
        outputBlobName = null;
    }
    catch
    {
        //re-add the blobName to the queue and this function will be executed again until the csv file has been handled successfully.
        outputBlobName = blobName;
    }
}
Amor
  • 8,325
  • 2
  • 19
  • 21
  • If this solution could work for you, please come back and mark this reply as answer. A thread which has been answered will be easily searched. It will help others who encounter similar issue. If you have further questions about this topic, please feel free to let me know. – Amor May 23 '17 at 05:18
0

I like Amor's solution, but have a few suggestions to add to it.

If you abandon the BlobTrigger approach and instead enqueue a Service Bus Queue Message indicating the blob that needs to be processed, you can trigger your processing with a ServiceBusTrigger. In the event that an exception occurs, abandon the message and it will be available for another processing attempt. This would let you only have one webjob and still have redundancy.

The other advantage of using a Service Bus queue is that you can get guaranteed at least once and at most once processing along with guaranteed message locking when a message is read. This is not the case with a standard Storage queue. This would also give you a scalability option in the future if you wanted to add a second Webjob instance to monitor the same service bus queue.

Rob Reagan
  • 7,313
  • 3
  • 20
  • 49