3

I have this:

Parallel.ForEach(numbers, (number) =>
{
    var value = Regex.Replace(number, @"\s+", "%20");

    tasks.Add(client.GetAsync(url + value));
});

await Task.WhenAll(tasks).ConfigureAwait(false);

foreach (var task in tasks)
{
  ...
}

Sometimes returns less tasks when reaching the foreach(var task in tasks), but after a few requests, starts returning all the tasks.

Ive changed the ConfigureAwait to true and still sometimes returns less tasks.

BTW Im using Parallel.ForEach beacuse each client.GetAsync(url + value) its a request to an external api with the particularity that its latency SLA is lower than 1s for 99% of its requests

Can you guys explain me why it returns less tasks sometimes?

And is there a way to guarantee returning always all tasks?

Thanks

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
MarchalPT
  • 1,268
  • 9
  • 28
  • 7
    I bet you're using `System.Collections.Generic.List` for `tasks`.This collection is not thread-safe. You must use a thread-safe collection. See [System.Collections.Concurrent](https://learn.microsoft.com/en-us/dotnet/api/system.collections.concurrent?view=net-5.0) namespace – Alexander Petrov Mar 14 '21 at 10:51
  • And in general, there is no point in using `Parallel.ForEach` in your case. Just run asynchronous tasks. – Alexander Petrov Mar 14 '21 at 10:53
  • 1
    I don't think you need `Parallel.ForEach`. Using async, the documents will be downloaded in parallel anyway. – vc 74 Mar 14 '21 at 10:53
  • 1
    What's the point of the `Parallel.ForEach`? You're not performing **any** sort of work inside of it. Just use a normal `foreach` loop to add all your tasks to a list. Then you won't hit the issue describes in the above comment. – pinkfloydx33 Mar 14 '21 at 10:54
  • I have a version without Parallel.ForEach and its 3 times slower then this one, each client.GetAsync(url + value) it does it waits for less then 1s for 99% of cases, thats why I want parallel use – MarchalPT Mar 14 '21 at 10:56
  • Im making a get and waiting for an external response inside the parallel foreach, if I dont use parallel then it will be awaiting for each response – MarchalPT Mar 14 '21 at 10:58
  • Add an async method that also contains the `Regex.Replace()` stuff, add all these to a collection of Tasks and then `await Task.WhenAll()`. The use of `.ConfigureAwait(false)` may depend on what runs this code, but this context is missing. – Jimi Mar 14 '21 at 10:59
  • 1
    No, you can collect the tasks in a local list and then call `WhenAll` – vc 74 Mar 14 '21 at 10:59
  • 1
    You're not awaiting client.GetAsync so it should return almost instantly, so I dunno how your claim makes any sense. Ditch the Parallel or switch to a threadsafe collection (which will then have its own minimal impact) – pinkfloydx33 Mar 14 '21 at 11:12
  • @pinkfloydx33 Yes Im waiting for all client.GetAsync in await Task.WhenAll(tasks).ConfigureAwait(false); – MarchalPT Mar 14 '21 at 11:13
  • @Jimi Im running on a api with a single controller and a single service – MarchalPT Mar 14 '21 at 11:14
  • @AlexanderPetrov Im using the collection you mentioned and seems working for now.. – MarchalPT Mar 14 '21 at 11:15
  • That I understand. But you aren't doing it within the Parallel.ForEach so your claims about it taking extra time or that if you don't use it that it will wait for each response, don't hold water – pinkfloydx33 Mar 14 '21 at 11:16
  • @pinkfloydx33 within the parallel foreach when I do tasks.Add(client.GetAsync(url + value)); it sends the request to the external api right away, then after the parallel foreach im waitning for all responses, its requests times time to be answered by the external api if I do await client.GetAsync(url + value) inside a normal foreach means it will be awaiting there for the response before proceding... – MarchalPT Mar 14 '21 at 11:20
  • Then `.ConfigureAwait(false)` makes sense. As mentioned, IMO you should use a method that includes the Regex part (which is what makes you use a paraller `For`) and create a List of Tasks that run this method. So `Parallel.ForEach()` is no longer needed. – Jimi Mar 14 '21 at 11:20
  • `requests times time to be answered by the external api if I do await client.GetAsync(url + value) inside a normal foreach means it will be awaiting there for the response before proceding` This is simply not true. If it *is* true then something's wrong with `GetAsync` code. Async code, at the very bottom of the chain, is supposed to setup an async IO callback, and bail out. This happens almost instantly. When the IO returns, the callback is called and the thread continues. If you don't `await` and instead accumulate in a `List` to `await WhenAll` together, all calls run simultaneously – Charlieface Mar 14 '21 at 12:03
  • Regarding the method `client.GetAsync`, is this the [`HttpClient.GetAsync`](https://learn.microsoft.com/en-us/dotnet/api/system.net.http.httpclient.getasync) method? If not, could you edit the question and include the code of this method? – Theodor Zoulias Mar 14 '21 at 14:00
  • @TheodorZoulias yes it is – MarchalPT Mar 14 '21 at 15:21
  • OK, forget about the `Parallel.ForEach` then. The `HttpClient.GetAsync` is not a CPU-heavy method. It returns an incomplete `Task` practically instantly. Your concern now should be whether you are overloading the remote server with too many concurrent requests. In case you are interested about ways to limit the concurrency of asynchronous I/O-bound operations, you can take a look [here](https://stackoverflow.com/questions/10806951/how-to-limit-the-amount-of-concurrent-async-i-o-operations). – Theodor Zoulias Mar 14 '21 at 15:48

2 Answers2

8

And is there a way to guarantee returning always all tasks?

Several people in the comments are pointing out you should just do this, on the assumption that numbers is a non-threadsafe List:

    foreach(var number in numbers)
    {
        var value = Regex.Replace(number, @"\s+", "%20");

        tasks.Add(client.GetAsync(url + value));
    }

    await Task.WhenAll(tasks).ConfigureAwait(false);

    foreach (var task in tasks)
    {
      ...
    }

There doesn't seem to be any considerable benefit in parallelizing the creation of the tasks that do the download; this happens very quickly. The waiting for the downloads to complete is done in the WhenAll

ps; there are a variety of more involved ways to escaping data for a URL, but if you're specifically looking to convert any kind of whitespace to %20, I guess it makes sense to do it with regex..

Edit; you asked when to use a Parallel ForEach, and I'm going to say "don't, generally, because you have to be more careful about th contexts within which you use it", but if you made the Parallel.ForEach do more syncronous work, it might make sense:

    Parallel.ForEach(numbers, number =>
    {
        var value = Regex.Replace(number, @"\s+", "%20");

        var r = client.Get(url + value));

        //do something meaningful with r here, i.e. whatever ... is in your  foreach (var task in tasks)

    });

but be mindful if you're performing updates to some shared thing, for coordination purposes, from within the body then it'll need to be threadsafe

Caius Jard
  • 72,509
  • 5
  • 49
  • 80
  • Going to try it now, with concurrent collection right? – MarchalPT Mar 14 '21 at 11:25
  • 2
    No need for concurrent collection; nothing is happening with it concurrently; it happens sequentially in the `foreach`. The most critical part is that you don't `await` in the foreach (which you don't, but I'm saying "don't be tempted to add it"), otherwise the IO of the requests *will* be done sequentially – Caius Jard Mar 14 '21 at 11:26
  • Its working for now, all tasks are being returned with the same speed as the parallel foreach and with no cuncurrent collection, will do some more tests and then accept this answear:) – MarchalPT Mar 14 '21 at 11:31
  • By the way, in a simple way can you tell me when should I use Parallel foreach then? – MarchalPT Mar 14 '21 at 11:31
  • Spent the last five minutes typing my answer on my phone so didn't see your answer. Leaving mine but providing an upvote – pinkfloydx33 Mar 14 '21 at 11:37
  • 2
    @pinkfloydx33 I CW'd it anyway, as it was really only visualizing what everyone else was saying so I didn't feel like it was "my answer".. but thanks! :) – Caius Jard Mar 14 '21 at 11:56
  • 1
    @MarchalPT you should use `Parallel.ForEach` when you have CPU-bound work to do, and you want to use multiple cores of the machine to speed things up. This method is a bit primitive, it has a problematic default degree of parallelism, and in general is unlikely to throw you in the pit of success, unless you know very well what you are doing. A safer tool for doing parallel work is the PLINQ. It is like LINQ, but starts with `.AsParallel()`. For example: `Task[] tasks = urls.AsParallel().AsOrdered().Select(url => client.GetAsync(url)).ToArray();` – Theodor Zoulias Mar 14 '21 at 14:13
5

You haven't shown it, so we can only guess but I assume that tasks is a List<>. This collection type is not thread-safe; your parallel loop is likely "overwriting" values. Either perform manual locking of your list or switch to a thread-safe collection such as a ConcurrentQueue<>

var tasks = new ConcurrentQueue<Task<string>>();

Parallel.ForEach(numbers, number =>
{
    var value = Regex.Replace(number, @"\s+", "%20");
    tasks.Enqueue(client.GetAsync(url + value));
});

await Task.WhenAll(tasks.ToArray()).ConfigureAwait(false);

foreach (var task in tasks)
{
   // whatever 
}

That said, your use of Parallel.ForEach is quite suspect. You aren't performing anything of real significance inside the loop. Use of Parallel, especially with proper locking, likely has higher overhead negating any potential gains you claim to observe or are realized by paralellizing the Regex calls. I would convert this to a normal foreach loop and precompile the Regex to offset (some of) its overhead:

// in class
private static readonly Regex SpaceRegex = new Regex(@"\s+", RegexOptions.Compiled);

// in method
var tasks = new List<Task<string>>();

foreach (var number in numbers)
{
    var value = SpaceRegex.Replace(number, "%20");
    tasks.Add(client.GetAsync(url + value));
}

await Task.WhenAll(tasks).ConfigureAwait(false);

foreach (var task in tasks)
{
   // whatever 
}

Alternatively, don't use a regex at all. Use a proper Uri escaping mechanism which will have the added benefit of fixing more than just spaces:

var value = Uri.EscapeDataString(number);
// or
var fullUri = Uri.EscapeUriString(url + number);

Note there are two different methods there. The proper one to use depends on the values of url and number. There's also other mechanisms such as the HttpUtility.UrlEncode method... but I think these are the preferred ones.

pinkfloydx33
  • 11,863
  • 3
  • 46
  • 63
  • Loved the way you did with the Regex – MarchalPT Mar 14 '21 at 11:39
  • going to try it now,is it valid with url with https://? – MarchalPT Mar 14 '21 at 11:49
  • 1
    @MarchalPT see example: https://tio.run/##bc29DsIgFAXgvU9BOpUYIeDPIHFSN7fGOCMhSlKg4VKTPj1CipNuNyfnfFfBWoFJ6S0DcpN96ICOqGV8g7a7fSuakk9hKOErxhEOlNqZgImaKG9pbiwVlhv9DFFbcguGXEDJUeerj8G4Z1eIVX2AkWhO3oEfNLmHLF2N093E8Nfi2VoGP@JZRlnJiv21OBYpfQA – pinkfloydx33 Mar 14 '21 at 11:58
  • Downvoting solely for suggesting the [very specialized](https://stackoverflow.com/questions/15400133/when-to-use-blockingcollection-and-when-concurrentbag-instead-of-listt/64823123#64823123) `ConcurrentBag` class as a container. This class is intended for mixed producer-consumer scenarios, that are extremely rare. A better option for storing the results of a `Parallel.ForEach` loop is the `ConcurrentQueue`. – Theodor Zoulias Mar 14 '21 at 13:55
  • @TheodorZoulias happy now? – pinkfloydx33 Mar 14 '21 at 14:10
  • Yeap, much happier! I have declared my small personal war against the `ConcurrentBag` class, the clunky "concurrent `List`" impostor, and every little victory matters. – Theodor Zoulias Mar 14 '21 at 14:17