3

I've got a job processor which needs to handle ~300 jobs in parallel (jobs can take up to 5 minutes to complete, but they are usually network-bound).

The issue I've got is that jobs tend to come in clumps of a specific type. For simplicity, let's say there are six job types, JobA through JobF.

JobA - JobE are network bound and can quite happily have 300 running together without taxing the system at all (actually, I've managed to get more than 1,500 running side-by-side in tests). JobF (a new job type) is also network-bound, but it requires a considerable chunk of memory and actually uses GDI functionality.

I'm making sure I carefully dispose of all GDI objects with usings and according to the profiler, I'm not leaking anything. It's simply that running 300 JobF in parallel uses more memory than .NET is willing to give me.

What's the best practice way of dealing with this? My first thought was to determine how much memory overhead I had and throttle spawning new jobs as I approach the limit (at least JobF jobs). I haven't been able to achieve this as I can't find any way to reliably determine what the framework is willing to allocate me in terms of memory. I'd also have to guess at the maximum memory used by a job which seems a little flakey.

My next plan was to simply throttle if I get OOMs and re-schedule the failed jobs. Unfortunately, the OOM can occur anywhere, not just inside the problematic jobs. In fact, the most common place is the main worker thread which manages the jobs. As things stand, this causes the process to do a graceful shutdown (if possible), restart and attempt to recover. While this works, it's nasty and wasteful of time and resources - far worse than just recycling that particular job.

Is there a standard way to handle this situation (adding more memory is an option and will be done, but the application should handle this situation properly, not just bomb out)?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Basic
  • 26,321
  • 24
  • 115
  • 201
  • Perhaps, you can check [Process.PrivateMemorySize64](http://msdn.microsoft.com/en-us/library/s80a75e5%28VS.80%29.aspx) or something similar before running a job – Harvey Kwok Jul 04 '12 at 16:44
  • @HarveyKwok Thanks, I'm aware of those (in fact, I report them to a central server with every "heartbeat" so I can keep an eye on things). The problem is, they only tell me what I'm using, not how much I _can_ use – Basic Jul 04 '12 at 16:49
  • 2
    Why don't you just assign a "cost" to a job. 1 for all cheap jobs, big number of the expensive one. Don't let another one start if the sum of active jobs is > 300. – Hans Passant Jul 04 '12 at 16:52
  • @HansPassant That's a nice idea, I'll certainly give it a whirl. I admit I was hoping for a more measured approach than me sticking my finger in the air and essentially guesstimating how many `JobF`s can run and dividing 300 by that number. – Basic Jul 04 '12 at 16:55
  • If you know, on average, how much one `JobF` instance takes up then maybe you can calculate before starting one if it can fit in the system available memory. Here you can find a way to get the available system virtual memory: http://stackoverflow.com/questions/3296211/c-how-to-get-the-size-of-available-system-memory. – Marcel N. Jul 04 '12 at 16:56
  • 1
    @thecoon I think you're missing a link there... – Basic Jul 04 '12 at 16:57
  • @thecoon Thanks, that looks interesting. The odd thing is that I get OOMs with free physical memory (according to resource monitor) so I'm assuming the framework imposes a "fair" limit on each application. I'll have to play with those when I'm in work tomorrow and see if any tend to 0 as I start to get OOMs – Basic Jul 04 '12 at 17:42
  • Could you have 6 job processors (for each type) and tune them independently for maxparallel – paparazzo Jul 04 '12 at 19:36
  • @Blam Funny you should mention that, I actually had 5 job processors previously but was asked to combine them to have a unified queue as FIFO behavior was preferred. In addition, it was tricky to allow one job to "burst" resources if the others were idle without some interlocking which impacted performance. That said, I may end up with a 5-and-1 approach at this rate.... – Basic Jul 04 '12 at 21:47
  • But the process could consider resources total but tune in a way specific to the process. – paparazzo Jul 04 '12 at 22:25
  • @Blam: Can you provide a hyopthetical example? Do you mean (say) each [job] processor would check resources and determine how many workers to run in parallel? If so, what check specifically would you recommend? That's what I'm having issues with (I'm quite happy to have the resource check code in a central factory/similar rather than queue specific). It's possible I've completely misunderstood so please correct me if I have. – Basic Jul 04 '12 at 22:31
  • I should have commented "Could each process consider resources in total and tune to that?" But I will take a very simplistic shot at an answer. – paparazzo Jul 04 '12 at 22:46

4 Answers4

2

it's simply that running 300 JobF in parallel uses more memory than .Net is willing to give me.

Well then, just don't do this. Queue up your jobs in the system ThreadPool. Or, alternatively, scale-out and distribute the load to more systems.

Also, take a look at CERs to at least run cleanup code if an out of memory exceptions happens.

UPDATE: Another thing to be aware of, since you mentioned you use GDI, is that it can throw an OutOfMemoryException for things that are not out of memory conditions.

Community
  • 1
  • 1
Jordão
  • 55,340
  • 13
  • 112
  • 144
  • Sorry, but that's not much help - I'm aware of the ThreadPool and TPL. Neither has the flexibility to handle the queue. I've actually got a module which replaces my workers with TPL tasks. Unfortunately, it's worse at memory management than I am. As mentioned in my Q, I'm aware that I need to run less than 300 of these tasks. What I'm looking for is a reliable, robust way to determine how many of type `JobF` I can spin up given the current resource limits – Basic Jul 04 '12 at 21:50
  • That said, the CER stuff definitely looks useful and I haven't read that article before, thank you. I'll have a skim now – Basic Jul 04 '12 at 21:52
  • @Basic: I believe then that this is a matter of tuning and finding a way to estimate how much memory a job will take, maybe based on past jobs. Start collecting statistics about the jobs... – Jordão Jul 04 '12 at 23:09
  • You may well be right. Incidentally, I've edited the sentence you've quoted so edited your answer to match. The sentiment is the same – Basic Jul 05 '12 at 00:33
  • +1 The [FailFast/MemoryGates](http://msdn.microsoft.com/en-us/magazine/cc163716.aspx#S10) section looks like it _may_ save the day. It doesn't guarantee success but allows me to check for likely failure before committing myself. Thanks for the link - fascinating reading. I want to give @DanielMošmondor a chance to respond before I accept an answer but this is definitely promising – Basic Jul 05 '12 at 01:21
  • @Basic: I updated the answer with some information I think could be relevant. – Jordão Jul 05 '12 at 10:08
  • Thanks Jordao but I think I'm getting real OOMs as once they start coming in, they get raised throughout my code, even on threads which aren't touching GDI. – Basic Jul 05 '12 at 10:17
2

I am doing something remotely similar to your case, and I opted for the approach in which I have ONE task processor (main queue manager that runs on ONE node) and as much AGENTS that run on one or more nodes.

Each of the agents run as a separate process. They:

  • check for task availability
  • download required data
  • process data
  • upload result

Queue manager is designed in a way so if any agent fails during execution of the job, it will be simply re-tasked to another agent after some time.

8 agents running side-by-side in one box

BTW, consider NOT having all the tasks run at once in parallel, since there really is some overhead (it might be substantial) in switching context. In your case, you might be saturating network with an unnecessary PROTOCOL traffic instead real DATA traffic.

Another fine point of this design is that if I start to fall behind on data processing, I can always turn on one more machine (say Amazon C2 instance) and run several agents more that will help complete the task-base more quickly.

In answer to your question:

Every host will take as much as it can, since there are finite number of agents that run on one host. When one task is DONE, another is taken and ad infinitum. I don't use database. Tasks aren't time-critical, so I have one process that goes round and round on the incoming data-set and creates new tasks if something failed in previous run(s). Concretely:

http://access3.streamsink.com/archive/ (source data)

http://access3.streamsink.com/tbstrips/ (calculated results)

On each queue manager run, source and destination are scanned, resulting sets subtracted and filenames turned into tasks.

And still some more:

I am using web services to get job info/returning results and simple http to get the data for processing.

Finally:

This is simpler of the 2 manager/agent pairs that I have - other one is somehow more complicated so I won't go into detail of it here. Use the e-mail :)

Daniel Mošmondor
  • 19,718
  • 12
  • 58
  • 99
  • Thanks Daniel, I hadn't considered this but it seems to be an elegant solution. We're also using EC2 and multiple queue processing hosts is on the roadmap. Could you provide some detail on how you're distributing tasks between hosts? Is it a double-flagging mechanism on a Db record or something else? Are you using EF? I've hit some concurrency/caching issues even when creating/destroying DBContexts for each operation. – Basic Jul 05 '12 at 01:06
  • One of the issues I've been shelving for the moment is that `JobF`s generate image files locally so I also need some way to either collate them in a central location or retrieve them from the appropriate instance on demand (although that's beyond the scope of this Q) – Basic Jul 05 '12 at 01:09
1

Ideally could partition into process-profile. CPU bound, memory bound, IO bound, net work bound. I am a rookie at parallel processing but what TPL does well is CPU bound and cannot really tune much past MaxDegreeOfParallelism.

A start is CPU bound get MaxDegreeOfParallelism = System.Environment.ProcessorCount -1

And every thing else gets MaxDegreeOfParallelism = 100. I know you said the network stuff will scale bigger but at some point the limit is your bandwidth. Is spinning up 300 jobs (that eat memory) really giving you more throughput? If so look at the answer from Joradao.

paparazzo
  • 44,497
  • 23
  • 105
  • 176
  • "Is spinning up 300 jobs ... really giving you more throughput?" Another interesting question and the answer is; "It depends entirely on the remote server". If the server responds quickly, no (Time to instantiate and run a new thread > the time to wait for an old thread to complete and recycle). Conversely, if the remote server responds slowly (30-300s), I end up with a lot of waiting around and little processing. I haven't got the balance right yet but 300 is definitely in the right ballpark. I don't know in advance what servers I'll be talking to, so it's the best I've come up with so far – Basic Jul 05 '12 at 00:28
  • Because of the issue which prompted this Q, I'm debating a more dynamic approach overall - eg take into account server performance to date when guesstimating parallel task count. This is technically flawed (ServerA slow doesn't imply ServerB slow) but taken over a large enough group of servers, I'll probably get a relatively decent average based on the bottleneck (assuming it's consistent, eg my bandwidth) – Basic Jul 05 '12 at 00:30
  • OK but still back to CPU based tasks are not the same. MaxDegreeOfParallelism = System.Environment.ProcessorCount -1 is a start. At this point have given all I got. And no thankyou for the +1 – paparazzo Jul 05 '12 at 00:42
  • apologies, but at the time I commented, I hadn't +1'd any answers as I was waiting to see what was the viable. I've rectified the situation now :) That said, as mentioned, my tasks aren't CPU bound so limiting based on that isn't much help – Basic Jul 05 '12 at 01:11
0

If your objects implements IDisposable interface you should not rely on Garbage Recollector because that could produce a memory leak.

For example, if you have that class:

class Mamerto : IDisposable
{
    public void methodA()
    {
        // do something
    }

    public void methodB()
    {
        // do something
    }

    public void Dispose()
    {
        // release resources
    }

And you use that class in that way:

using( var m = new Mamerto() )
{
    m.methodA();
    m.methodB();
    // you should call dispose here!
}

The Garbage recollector will mark m object as "ready to delete", putting it on Gen 0 collection. When the Garbage recollector tryes to delete all the objects on Gen 0, detects the Dispose Method and automatically promote the object to Gen 1 (because it's not "so easy" to delete that object). Gen 1 objects are not checked as often as Gen 0 objects, then, could lead to a memory leak.

Please read that article for further info http://msdn.microsoft.com/en-us/magazine/bb985011.aspx

If you proceed with an explicid Dispose, then you can avoid that annoying leak.

using( var m = new Mamerto() )
{
    m.methodA();
    m.methodB();
    m.Dispose();
}
Jordi
  • 2,789
  • 1
  • 20
  • 35
  • Can you explain more / link to an example? Essentialy, I'm doing `Using Gfx as Graphics ... End Using` which, I believe implicitly calls `.Dispose` on the object once we hit the `End Using`. What do you suggest instead? – Basic Jul 04 '12 at 16:56
  • If you use "using" clausule, that promotes the object to be deleted by the garbage recollector, but there is no way to know if the object will be deleted immediatelly. If the object implements disposable interface, the garbage recollector doesn't delete it immediatelly, they simply promotes to another generation and at the end it will take a huge more amount of time to delete it. about garbage generator & gens http://msdn.microsoft.com/en-us/library/ms973837.aspx – Jordi Jul 04 '12 at 17:06
  • @Jordi This is the first time I heard people suggesting not to use `using` to do deterministic memory management. I believe the thing that he care more is actually those unmanaged resources. Managed memory is not signitificant in his case. – Harvey Kwok Jul 04 '12 at 17:09
  • I am not sugesting not using "using". What i am trying to explain is that the garbage recollector have a list of objects ready to be deleted. If your object implements a dispose method the garbage recollector detects work to be done and puts the object in another list. The problem with that list is that is not checked so often than the other. – Jordi Jul 04 '12 at 17:18
  • In order to clarify a little bit. If you rely on the garbage recollector to execute your disposable method, you can end with a memory leak. If you use a "using" clausule and before leaving you explicity call disposable method, then you are doing fine. – Jordi Jul 04 '12 at 17:28
  • So you're suggesting I should do `Using Gfx as Graphics ... Gfx.Dispose() ... End Using`? @HarveyKwok is right that the memory used by GDI is considerably larger than the managed object pointing at it. Can you explain why this is any better than just letting the End using call `Dispose()` My understandnig is that objects shouldn't be elevated to the next gen unless they also implement a `Finalizer` and I can't see anything in the link GC page (which I've read before) that says otherwise? – Basic Jul 04 '12 at 17:35
  • I just updated my answer in order to avoid confusion and clarify this delicated subject. English is not my native language, sorry if I caused confusion, I'm just new here :). @Basic, please try it with the explicid Dispose. Easy to change, easy to test! – Jordi Jul 04 '12 at 19:12
  • 2
    @Jordi: the `using` statement will call `Dispose()` in a finally block; there's no need to call it explicitly like that. – Jordão Jul 04 '12 at 19:22
  • @Jordão: problem is "when" :) – Jordi Jul 04 '12 at 19:42
  • @Jordi, "when" is not the problem, the finally block runs _just after_ the code inside the `using` block finishes (either normally or with an exception), calling `Dispose()`. – Jordão Jul 04 '12 at 20:11