8

After reading this post: Nesting await in Parallel.ForEach

I tried to do the following:

private static async void Solution3UsingLinq()
{
    var ids = new List<string>() { "1", "2", "3", "4", "5", "6", "7", "8", "9", "10" };

    var customerTasks = ids.Select(async i =>
    {
        ICustomerRepo repo = new CustomerRepo();
        var id = await repo.getCustomer(i);
        Console.WriteLine(id);

    });
}

For some reason, this doesn't work... I don't understand why, I think there is a deadlock but i'm not sure...

Ariel Volnicher
  • 153
  • 1
  • 1
  • 8
  • As a side note: The C# style guide mandates that all method names in C# should start with a capital letter. Also, `async` methods should carry the suffix `Async`. So, `getCustomer` should actually be called `GetCustomerAsync`. – spender Mar 07 '18 at 00:55
  • @spender you are absolutely correct, this was just a sample code I wrote quickly to check how await behaves in Linq.... – Ariel Volnicher Mar 07 '18 at 11:02
  • 1
    Once again: the result of a query expression is an object that represents a query, **not the results of the query**. You obtain the results by *executing* the query. – Eric Lippert Mar 07 '18 at 14:58
  • @EricLippert thanks, I understand that now, So all I did is just added a "ToList" to the end of the query, which made it run. What I don't understand, why is that not a valid solution? why do I have to use "WhenAll" ? just to properly obverse the exceptions ? – Ariel Volnicher Mar 07 '18 at 22:18
  • 2
    Because under no circumstances should you do what you did. **Select is not foreach**. If you want to do some action on every element of a sequence then that's a `foreach` loop, with the action in the body of a loop. A select should *select something*, not *cause an effect*. – Eric Lippert Mar 07 '18 at 22:21
  • The final code snippet in the answer of Harald Coppoolse is the correct way to do this: the selection produces a sequence of values, and then the effects are produced by the `foreach` loop. – Eric Lippert Mar 07 '18 at 22:28
  • @EricLippert is using an await inside a Select a mistake ? I know it works, but if I understand your comment, it is a bad practice. However, I am currently reading stephan cleary book "Concurrency ni C#", and in one of his examples (processing tasks as the complete) he implemented the solution by using an await inside a select. If I could I would post it here in the comment but I don't know how... – Ariel Volnicher Mar 08 '18 at 18:32
  • Bizarre comment. `Select` can return the result of a function, which in turn might have to select further data from an external service, which would be async... – jenson-button-event May 06 '21 at 11:27

2 Answers2

22

So at the end of your method, customerTasks contains an IEnumerable<Task> that has not been enumerated. None of the code within the Select even runs.

When creating tasks like this, it's probably safer to materialize your sequence immediately to mitigate the risk of double enumeration (and creating a second batch of tasks by accident). You can do this by calling ToList on your sequence.

So:

var customerTasks = ids.Select(async i =>
{
    ICustomerRepo repo = new CustomerRepo();
    var id = await repo.getCustomer(i); //consider changing to GetCustomerAsync
    Console.WriteLine(id);

}).ToList();

Now... what to do with your list of tasks? You need to wait for them all to complete...

You can do this with Task.WhenAll:

await Task.WhenAll(customerTasks);

You could take this a step further by actually returning a value from your async delegate in the Select statement, so you end up with an IEnumerable<Task<Customer>>.

Then you can use a different overload of Task.WhenAll:

IEnumerable<Task<Customer>> customerTasks = ids.Select(async i =>
{
    ICustomerRepo repo = new CustomerRepo();
    var c = await repo.getCustomer(i); //consider changing to GetCustomerAsync
    return c;

}).ToList();

Customer[] customers = await Task.WhenAll(customerTasks); //look... all the customers

Of course, there are probably more efficient means of getting several customers in one go, but that would be for a different question.

If instead, you'd like to perform your async tasks in sequence then:

var customerTasks = ids.Select(async i =>
{
    ICustomerRepo repo = new CustomerRepo();
    var id = await repo.getCustomer(i); //consider changing to GetCustomerAsync
    Console.WriteLine(id);

});
foreach(var task in customerTasks) //items in sequence will be materialized one-by-one
{
    await task;
}
spender
  • 117,338
  • 33
  • 229
  • 351
  • 2
    Doing a `ToList` will execute them all in parallel, might not be was was intended. Also for your 2nd query there is no need to the select to be async in this case, just return the result of `getCustomer` – Magnus Mar 07 '18 at 09:28
  • @Magnus can you explain why "ToList" will run a select in parallel ? so what will "AsParallel" do ? I think I a missing something here – Ariel Volnicher Mar 07 '18 at 13:36
  • @spender can you explain in more detail what was wrong with using the await inside the "Select" ? Why can't I wait for each task and as it finished write it to the console ? why did you have to split it and wait for all of the tasks to complete using "WhenAll" and only then go over them and write their results to the console ? – Ariel Volnicher Mar 07 '18 at 13:39
  • @ArielVolnicher Doing a ToList will enumerate the query and put the result task into a list. By doing that all tasks will be started (and executed in parallel) – Magnus Mar 07 '18 at 13:48
  • @spender I apologize but I still don't understand your edit, in your edit we use an await inside the select query. The effect will be, if I understand correctly,That the tasks will start concurrently and when each of them complete the Console.Wrtieline will called. so why do I need to go over the tasks again with a forearch ? – Ariel Volnicher Mar 08 '18 at 18:39
  • Thank you for pointing out the overload of `Task.WhenAll`. I was about to `Task.WhenAll(customerTasks)` followed by `customerTasks.Select(x => x.Result).ToArray()`. I came here looking for a cleaner solution, as using Task.Result is usually a bad idea. – Inrego Oct 11 '22 at 07:46
3

Addition:

There seems to be some confusion about when the the LINQ statements actually are executed, especially the Where statement.
I created a small program to show when the source data actually is accessed Results at the end of this answer

end of Addition

You have to be aware about the lazyness of most LINQ functions.

Lazy LINQ functions will only change the Enumerator that IEnumerable.GetEnumerator() will return when you start enumerating. Hence, as long as you call lazy LINQ functions the query isn't executed.

Only when you starte enumerating, the query is executed. Enumerating starts when you call foreach, or non-layzy LINQ functions like ToList(), Any(), FirstOrDefault(), Max(), etc.

In the comments section of every LINQ function is described whether the function is lazy or not. You can also see whether the function is lazy by inspecting the return value. If it returns an IEnumerable<...> (or IQueryable) the LINQ is not enumerated yet.

The nice thing about this lazyness, is that as long as you use only lazy functions, changing the LINQ expression is not time consuming. Only when you use non-lazy functions, you have to be aware of the impact.

For instance, if fetching the first element of a sequence takes a long time to calculate, because of Ordering, Grouping, Database queries etc, make sure you don't start enumerating more then once (= don't use non-lazy functions for the same sequence more than once)

Don't do this at home:

Suppose you have the following query

var query = toDoLists
    .Where(todo => todo.Person == me)
    .GroupBy(todo => todo.Priority)
    .Select(todoGroup => new
    {
        Priority = todoGroup.Key,
        Hours = todoGroup.Select(todo => todo.ExpectedWorkTime).Sum(),
     }
     .OrderByDescending(work => work.Priority)
     .ThenBy(work => work.WorkCount);

This query contains only lazy LINQ functions. After all these statement, the todoLists have not been accessed yet.

But as soon as you get the first element of the resulting sequence all elements have to be accessed (probably more than once) to group them by priority, calculate the total number of involved working hours and to sort them by descending priority.

This is the case for Any(), and again for First():

if (query.Any())                           // do grouping, summing, ordering
{
    var highestOnTodoList = query.First(); // do all work again
    Process(highestOnTodoList);
}
else
{   // nothing to do
    GoFishing();
}

In such cases it is better to use the correct function:

var highestOnToDoList = query.FirstOrDefault(); // do grouping / summing/ ordering
if (highestOnTioDoList != null)
   etc.

back to your question

The Enumerable.Select statement only created an IEnumerable object for you. You forgot to enumerate over it.

Besides you constructed your CustomerRepo several times. Was that intended?

ICustomerRepo repo = new CustomerRepo();
IEnumerable<Task<CustomerRepo>> query = ids.Select(id => repo.getCustomer(i));

foreach (var task in query)
{
     id = await task;
     Console.WriteLine(id);
}

Addition: when are the LINQ statements executed?

I created a small program to test when a LINQ statement is executed, especially when a Where is executed.

A function that returns an IEnumerable:

IEnumerable<int> GetNumbers()
{
    for (int i=0; i<10; ++i)
    {
        yield return i;
    }
}

A program that uses this enumeration using an old fashioned Enumerator

public static void Main()
{
    IEnumerable<int> number = GetNumbers();
    IEnumerable<int> smallNumbers = numbers.Where(number => number < 3);

    IEnumerator<int> smallEnumerator = smallNumbers.GetEnumerator();

    bool smallNumberAvailable = smallEnumerator.MoveNext();
    while (smallNumberAvailable)
    {
        int smallNumber = smallEnumerator.Current;
        Console.WriteLine(smallNumber);
        smallNumberAvailable = smallEnumerator.MoveNext();
    }
}

During debugging I can see that the first time GetNumbers is executed the first time that MoveNext() is called. GetNumbers() is executed until the first yield return statement.

Every time that MoveNext() is called the statements after the yield return are performed until the next yield return is executed.

Changing the code such that the enumerator is accessed using foreach, Any(), FirstOrDefault(), ToDictionary, etc, shows that the calls to these functions are the time that the originating source is actually accessed.

if (smallNumbers.Any())
{
    int x = smallNumbers.First();
    Console.WriteLine(x);
}

Debugging shows that the originating source starts enumerating from the beginning twice. So indeed, it is not wise to do this, especially if you need to do a lot to calculate the first element (GroupBy, OrderBy, Database access, etc)

Harald Coppoolse
  • 28,834
  • 7
  • 67
  • 116
  • what do you mean I constructed the Repo several times ? that was of course not intended, you mean because it is in the select ? – Ariel Volnicher Mar 07 '18 at 13:50
  • Also, in the example you gave, you called the "Sum()" , isn't this a non-lazy functions ? – Ariel Volnicher Mar 07 '18 at 13:51
  • constructing several time: yes, because the code in the selector parameter of Enumerable.Select will be executed once per input item of the Select function – Harald Coppoolse Mar 07 '18 at 15:05
  • Sum() is indeed non-lazy. In the example the sum is within a Select. Select is Lazy. So as long as you don't ask for the first element of the items in the result, the enumerator of ToDoLists is not fetched, and thus the Where is not executed and all following methods are also not executed. So the Select is not executed and thus the Sum is not executed. In my example everything is summed and Grouped and Ordered when I call Any(), and summed and grouped and Ordered again when I call First(). To be exact: the functions are performed every time the first element of the sequence is asked for – Harald Coppoolse Mar 07 '18 at 15:13
  • Though your comment above gets the idea across reasonably well, I think it's important to note that of course Where is executed. It has to be; there's a call to it right there. `Where` starts, it runs for a while, and it returns a query object representing the filter. What it does *not* do is *enumerate the query object which is its first argument*, and nor does it *enumerate the query object which it returns*. But `Where` surely runs eagerly. – Eric Lippert Mar 07 '18 at 22:39
  • @EricLippert so if I understand what you are saying, the query that harald wrote will run the Where ? where is not lazy ? – Ariel Volnicher Mar 08 '18 at 21:03
  • @ArielVolnicher: The *function* `Where` runs, and it returns a lazily-enumerated object. I was pointing out that it is misleading to say that `Where` doesn't run. It runs; it must run! It *returns an object that represents the query*. – Eric Lippert Mar 08 '18 at 21:32
  • Of course all functions are executed when they are called, that is: executed in the sense that the resulting Enumerator is changed (in IQueryable: the Expression is changed). What I meant was that the data was not enumerated yet, and thus the code you put in the Where – Harald Coppoolse Mar 11 '18 at 14:07