Parallel.ForEach hangs for a large loop

Question

I have an implementation of for loops that I am parallelizing using TPL. I am using a Dell laptop with 4GB RAM and i3 Core processor. I have multiple parallel.foreach which are invoked using Parallel.invoke. This program is an addin to Enterprise Architect for creating the model diagram and objects in EA.

Code is something like this:

Parallel.invoke(()=>parent1Creation(),()=>parent2Creation(),...);

where each parent creation is a Parallel.foreach:

Parallel.foreach(parents, (parent) => {
    //create parent 
    //create children
    for(child in parent.children) {
        childecreation();
    }

    for(child2 in parent.children) {
        childecreation();
    }
    //can be any type and number of children
}

I have an issue that when my loop size increases i.e around 1500-2000 iterations, Enterprise Architect stops working.

Is this an issue because of my laptop configuration or the way I am using parallel loops or with Enterprise architect.

How can I resolve it.

What is the reason to "parallelize" it? Is it cpu-bound and is `childecreation()` thread safe? — zerkms, Sep 12 '16 at 05:17
Parallelizing parallel task! Don't invoke, or don't parallel foreach. — Sarvesh Mishra, Sep 12 '16 at 05:18
Hi child creation is thread safe , also parent creation is independent of each other. Right now the complete execution is taking 12 hours . As my loops have around 1k or more iterations and also child creation can be a large loop too . I am trying to reduce the creation time . What is the best way to achieve it using multi threading or any other ways . — Akanksha, Sep 12 '16 at 06:21
Does the same thing happen with a version of the code that's not parallelized? Can you debug the code and see what it's doing when EA stops working? — svick, Sep 12 '16 at 12:52
It's not of much use to parallelize things that afterwards go through the same bottleneck called EA. — qwerty_so, Sep 12 '16 at 13:38
Instead of asking why your solution to a problem (unknown to us) is not working it might be a better idea to explain the problem you are trying to solve. Then we might maybe be able to suggest alternative solutions that don't suffer the same problem. — Geert Bellekens, Sep 12 '16 at 13:46
Possible duplicate of [When to use a Parallel.ForEach loop instead of a regular foreach?](http://stackoverflow.com/questions/12251874/when-to-use-a-parallel-foreach-loop-instead-of-a-regular-foreach) — Dah Sra, Sep 14 '16 at 08:42

EJoshuaS - Stand with Ukraine · Accepted Answer · 2016-09-12T15:07:26.917

I don't suggest this strategy. Running lots of Parallel.ForEach loops at once won't necessarily help your performance (see the caveat later in the post), especially if each of the Parallel.ForEach loops is handling a large number of iterations. At some point, using additional threads won't benefit your performance anymore and will just add overhead.

The caveat here is that Parallel.ForEach is generally good (but not perfect) at selecting the optimal number of threads for a particular foreach loop. There's no explicit guarantee as to exactly how many threads a particular foreach loop will use (or even that it will run in parallel), so it's conceivable that multiple Parallel.ForEach loops will, in fact, enhance your performance. The best way to check that is to use the debugger to see how many threads it's actually using at any given point. If it's not what you'd expect, you might check the implementation of the code in the Parallel.ForEach loop (for example); there are other steps you could take at this point to try to improve the performance (e.g. a good async/await implementation for IO-bound and other non-CPU-bound operations so that the thread can do more work - see below).

Trivial example: suppose you have a system where you have 4 threads and 4 cores and the 4 threads are the only things that are running on the system. (Obviously this'll never happen). The sensible thing from a scheduling point of view would be to have each core handle one thread each. Assuming that each of the threads is busy all the time (i.e. it's never sitting around waiting) how could adding additional threads improve your performance? If you start running, for example, 6 threads then obviously at least one core will now have to run at least 2 threads, which adds extra overhead with no clear benefit. The simplifying (and possibly untrue) assumptions here are that your tasks are 100% CPU-bound and that the threads are, in fact, running on separate cores. If one of these assumptions are untrue, that's a clear opportunity for enhancement. For example, if a thread spends a significant amount of time waiting for results from IO-bound operations, multiple threads on the CPU could, in fact, improve performance. You could also consider an async/await implementation to improve performance.

The point being that at some point adding additional threads won't give you any performance benefit, just added overhead (especially if the tasks involved are mostly CPU-bound rather than mostly IO-bound, for example). There's no way around that fact.

Non-CPU-bound operations (IO-bound tasks like calls to servers, for example) where the main holdup is waiting for a result from something external to the CPU/memory are parallelized differently. In fact, async/await does not necessarily create new threads; one of its major behaviors is to return control to the method in question's caller and "try" to do other work on the same thread if possible.

To repeat my favorite analogy, suppose that you go out to eat as part of a group of 10 people. When the waiter comes by to take orders, the first guy the waiters asks to order isn't ready but the other nine people are. The correct thing for the waiter to do is, rather than wait for the first guy to be ready to order, to have the other 9 people order first and then have the first guy order afterwards if he's ready by then. He definitely does not bring in a second waiter to wait for the one guy to be ready; in this case, the second waiter probably wouldn't actually reduce the total amount of time taken to complete the order. This is basically what async/await tries to accomplish; if all an operation is doing is waiting for a result from a server, for example, ideally you'd be able to do other things while it's waiting.

On the other hand, to extend the analogy, it's definitely not the case that the waiter actually makes the meal itself. In that case, adding more people (by analogy, threads) would genuinely speed things up.

To extend the analogy even further, if all the kitchen has is a four-burner stove, then there's a hard limit to how many people you can add to the kitchen staff before they run into the hard limit imposed by the stove size. Once you hit that limit, more kitchen staff will actually slow things down because they'll just be getting in each other's way because there's a hard limit to the number of things that can actually be cooking at once. No matter how big your kitchen staff is, you can't possibly have more than 4 items cooking on the stove at once. In this case, the number of cores you have is like the kitchen size; once you reach a certain point, adding more kitchen staff (threads) will detract from your performance (not enhance it).

Thanks for the expalanation. Can you also suggest the best way to improve performance. Actually all these loops are independent of each other and can be created at the same time. With the sequential loops and not using parallel.invoke() the whole process takes 12 hours . What can I possibly do to reduce it , without much overhead. — Akanksha, Sep 12 '16 at 06:07
Just try to not Invoke inside Invoke. In real world, if you have performance problem that CPU-bound and can be parallelized - do it once, or don't do it at all. — eocron, Sep 12 '16 at 06:32
@Akanksha Do you _have_ to process all the parent / child elements in one go like you're doing ? Can you use some sort of lazy evaluation which only does the processing when it's required ? Otherwise you have to seriously rethink your architecture. — auburg, Sep 12 '16 at 13:06
@Akanksha See my edits and see if they're helpful. By "sequential loops," do you mean "sequential parallel foreach loops" (i.e. not using Parallel.invoke) or "using sequential 'normal' foreach loops"? — EJoshuaS - Stand with Ukraine, Sep 12 '16 at 15:11
@eocron06 which one is better: to use parallal.foreach or parallel.invoke? if I have to choose anyone. — Akanksha, Sep 14 '16 at 08:49
@auburg I am parsing a file that has parent and number of children for that parent, now from this file I have to create element and diagram for EA. such that after parent diagram is created inside it child diagrams can created. — Akanksha, Sep 14 '16 at 08:51
@EJoshuaS sequential I mean for each loops invoke by parallel,.invoke(). I understood the situation by your explanation. I will try to implement accordingly and get back .thanks. — Akanksha, Sep 14 '16 at 09:07

score 2 · Answer 2 · answered Sep 12 '16 at 21:20

2

If you use a RDBMS backed model, you are better off doing some SQL against the model to get things done fast instead of using EA's API.

https://leanpub.com/InsideEA has a lot of details on the structure.

e.g. With SQLServer, you are going to be so much faster with raw INSERTs than walking through EA objects, not to mention JOINs to get data out fast.

I have scripts having close to 100x + more performance w/ SQL than using the API.

Not sure EA COM object is able to be invoked like you want. And if it is, model updates will still have to occur in some kind of sequence for Object_IDs to be assigned properly. This may explain why you would run in some kind of locking limit.

answered Sep 12 '16 at 21:20

philippeback

781
7
10

1

I wouldn't recommend doing SQL inserts, not even if you know the database structure very well. The API can be a bit slow, bit if this is about creating elements (which we don't know because the OP hasn't explained is problem) then there are pretty fast ways to create elements. The main trick is to avoid looping EA.Collections, and sometimes turn off the GUI feedback temporarily. – Geert Bellekens Sep 13 '16 at 03:58
Well, crafting an XMI file with whatever language script and import it all at once is a pretty decent way to go fast as well. Even better, export a sample and clone the template. As "The API can be a bit slow, the API can be slow as a very old dog". – philippeback Sep 13 '16 at 13:17
Most of the time I am now working with a side Access project using linked tables against the EA model and use that to clean/restructure/assess a ton of things fast. One example is detecting wrongly directed connectors and swapping them the right way. Very fast and easy with SQLServer SQL. I also have what I call an "hydratedRelationships" view, which is t_connector joined with t_object on start and end. Allows for lot of fast queries all over the model in no time. Cleaning messy t_xref entries also much easier to do with SQL than with the API. – philippeback Sep 13 '16 at 13:21
@philippeback thanks even I am planning to have a baseline template. Thanks. – Akanksha Sep 14 '16 at 09:08
@GeertBellekens thanks , I am trying to turn off the GUI feedback. Using SQL will not be a great idea , I would stick to the API, but will try to copy and modify from a base template instead of creating new elements each time. – Akanksha Sep 14 '16 at 09:10
Being the author of the book you recommend I'd also not suggest to create things with direct SQL. If, like the OP, someone starts creating masses of elements, I'd throw in the answer: why? Each single element should have a meaning which humans can explain. For mass-created elements I doubt that. That said, I can imagine repair and merge mechanisms where this way is viable/needed. But in general: use the API. – qwerty_so Sep 14 '16 at 18:57
Another option is to use EaDocX and more specifically EaXL if you want to import a lot of things fast. This one is using the API under the hood. Export a package with the elements, attributes (even relationships) you want, paste the values you want to import in the resulting Excel file, compare for a sanity check, and import in a single shot. https://www.eadocx.com/help/index.html?eadocx_and_excel.htm (as we are speaking Add-ins this would maybe make the add in you do unnecessary). – philippeback Sep 25 '16 at 18:05

Parallel.ForEach hangs for a large loop

2 Answers2