Webjobs and large database queries

Question

If needing to query a database with a resulting set of 100,000 rows plus. Where I then need to process that data. Can this be done, succesfully, in a continuous webjob? If so, how is the queue managed? I currently have this question

Webjob query being limited by take not processing any further data when triggered, or when being interrupted will not continue processing queue

Which discusses a problem of using a continuous webjob with a time trigger. The queue is being dumped if the webjob restarts, by dumped I mean, the queue is not processed any further. If a take to limit the rows in the query is used, the next pollevent does not process any data.

So much is managed under the hood with these webjobs, and there's some it's hard to get a good grasp of to manage a large queue.

My question:

Are webjobs suited to processing large amounts of data?

If so, should they be continuous or scheduled and why?

Not sure why you're posting comments telling people how to vote/not-vote. That's not how StackOverflow works (but you should know that, given your almost-12K rep). — David Makogon, Aug 21 '16 at 14:14
Looks like a duplicate of [this one](http://stackoverflow.com/questions/32597003/azure-webjobs-where-hosted-are-they-safe-as-long-running-processes). — David Makogon, Aug 21 '16 at 14:17
Sure, they can handle large amounts of data. But the triggering mechanism afaik is not in a sql trigger but of something else of your creation. This would imply some cooked up agent that you write to perform the trigger. It then can work against your `database with a resulting set of 100,000 rows plus` . I don't know that I would make it continuous. They are prone to failure anyway. I would recommend leveraging a queue mechanism for status checks and a workflow, so that any failed webjob can reinvigorate itself. — Drew, Aug 21 '16 at 14:48
Which then begs the question, why even have it in a webjob. Why create some cron like expression or the portal when you have your c# app (or other) and you determining a more fluid time schedule. When I say fluid, I mean, it changes based on your heuristics you build in — Drew, Aug 21 '16 at 14:55
I work with people on another platform but the analogies are identical. The first thing I try to do is build a separate agent. Because invariably I get disappointed to the rigidity of an event scheduler or the power of it. Meaning, it disallows certain calls. All of that goes away with an agent. So, yes, as you say, much can be done in them. But we have to prepare for what cannot. In our case "data loads" are forbidden (not by us but from the framework). There are other restrictions. Also, we need resiliency upon failure (sort of like Erlang's Let it Crash via Akka) — Drew, Aug 21 '16 at 15:07
In fact we want it to Crash. Why? Because first off we know it will, so why not build it from the ground up to deal with it. Second, we want to prove to ourselves we can still get paid when it does so — Drew, Aug 21 '16 at 15:08
Somewhat on topic I wrote [This](http://stackoverflow.com/a/38022108) as a stub for event mgt and monitoring (and performance reporting). It is scaled way back as a answer to say the least. It has an incarnation and an evtLog with `step`. It could be used cross-platform as a conceptual for whatever hooks one would want. Ignore that it is mysql :p — Drew, Aug 21 '16 at 15:26

score 1 · Accepted Answer · answered Aug 21 '16 at 15:27

Are webjobs suited to processing large amounts of data?

Sure, why not? If, for whatever reasons, you don't trust the WebJobs SDK, there's nothing stopping you from writing a plain console application that does all the processing and deploying that as a WebJob. This way nothing is hidden or "managed away" from you.

If so, should they be continuous or scheduled and why?

A continuous WebJob usually makes sense in the context of a trigger. You have some work waiting to be picked up and you signal that with a Storage Queue message or some other mechanism of your choosing (custom triggers).

A scheduled WebJob, well... it works on a schedule. Do you have one? Make it so then.

If none of that makes enough sense to shape a clear choice, why not just trigger it manually based on your own external logic?

From https://github.com/projectkudu/kudu/wiki/WebJobs-API#invoke-a-triggered-job:

Invoke a triggered job
POST /api/triggeredwebjobs/{job name}/run

Webjobs and large database queries

1 Answers1