10

I have the below which calculates the running total for a customer account status, however he first value is always added to itself and I'm not sure why - though I suspect I've missed something obvious:

    decimal? runningTotal = 0;
    IEnumerable<StatementModel> statement = sage.Repository<FDSSLTransactionHistory>()
        .Queryable()
        .Where(x => x.CustomerAccountNumber == sageAccount)
        .OrderBy(x=>x.UniqueReferenceNumber)
        .AsEnumerable()
        .Select(x => new StatementModel()
        {
            SLAccountId = x.CustomerAccountNumber,
            TransactionReference = x.TransactionReference,
            SecondReference = x.SecondReference,
            Currency = x.CurrencyCode,
            Value = x.GoodsValueInAccountCurrency,
            TransactionDate = x.TransactionDate,
            TransactionType = x.TransactionType,
            TransactionDescription = x.TransactionTypeName,
            Status = x.Status,
            RunningTotal = (runningTotal += x.GoodsValueInAccountCurrency)
        });

Which outputs:

29/02/2012 00:00:00 154.80  309.60  
30/04/2012 00:00:00 242.40  552.00  
30/04/2012 00:00:00 242.40  794.40  
30/04/2012 00:00:00 117.60  912.00  

Where the 309.60 of the first row should be simply 154.80

What have I done wrong?

EDIT: As per ahruss's comment below, I was calling Any() on the result in my View, causing the first to be evaluated twice - to resolve I appended ToList() to my query.

Thanks all for your suggestions

SWa
  • 4,343
  • 23
  • 40
  • Good question. +1. I can't see why this could be happening. Is there anything we're missing here? How are you outputting it? If I were you I'd start by breaking down the query for debugging purposes. Bring the `Select` onto a new line, check what the `Where` is bringing back first. –  Jul 30 '14 at 12:42
  • 2
    Can you give some example data for what you run your LINQ against? – Mare Infinitus Jul 30 '14 at 12:43
  • 1
    Can you post a code(complete program) which reproduces the issue? – Sriram Sakthivel Jul 30 '14 at 12:44
  • Will do, bear with me - didn't quite expect this response :D – SWa Jul 30 '14 at 12:50
  • 3
    Is this the only place you modify `runningTotal`? I guess no. I tried something similar which works perfectly. Your question is answerable only if you provide a repro.. – Sriram Sakthivel Jul 30 '14 at 12:51
  • 1
    I tried a sample (without your Repository obviously), and was not able to reproduce it. Try incrementing runningTotal in a function `runningTotal += Increment(x)` that just returns GoodsValueInAccountCurrency. Using a breakpoint you can see if the first value indeed gets called twice. It does not seem to be the case. – Troels Larsen Jul 30 '14 at 12:54
  • 1
    I cannot reproduce it either – AD.Net Jul 30 '14 at 12:56
  • 1
    You should read [this](http://stackoverflow.com/questions/6386184/c-sharp-paradigms-side-effects-on-lists). LINQ with side effects is usually a bad idea. I would do the projection without the running total and then afterwards go back through in a foreach and update the records. – ahruss Jul 30 '14 at 13:02
  • 3
    My guess would be somewhere you are calling `Any()` or `First()`, or something like that before you actually consume the list. That could make the first one be evaluated twice. You *could* fix that, if it is the problem, by adding a `.ToList()` to the end. The better solution is really just to avoid the abuse of LINQ, though. You get no guarantees about what order things will be called when you use LINQ, or how many times. It should be stateless. – ahruss Jul 30 '14 at 13:08
  • @ahruss that got it! I'm calling `Any()` in my View. Your suggestion resolves it. Could you please post as an answer? – SWa Jul 30 '14 at 13:11
  • 1
    Simple error. Solution has nothing to do with the code in the question. But it got solved :-D – Mare Infinitus Jul 30 '14 at 13:16

2 Answers2

7

Add a ToList() to the end of the call to avoid duplicate invocations of the selector.

This is a stateful LINQ query with side-effects, which is by nature unpredictable. Somewhere else in the code, you called something that caused the first element to be evaluated, like First() or Any(). In general, it is dangerous to have side-effects in LINQ queries, and when you find yourself needing them, it's time to think about whether or not it should just be a foreach instead.

Edit, or Why is this happening?

This is a result of how LINQ queries are evaluated: until you actually use the results of a query, nothing really happens to the collection. It doesn't evaluate any of the elements. Instead, it stores Abstract Expression Trees or just the delegates it needs to evaluate the query. Then, it evaluates those only when the results are needed, and unless you explicitly store the results, they're thrown away afterwards, and re-evaluated the next time.

So this makes the question why does it have different results each time? The answer is that runningTotal is only initialized the first time around. After that, its value is whatever it was after the last execution of the query, which can lead to strange results.

This means the question could just have easily have been "Why is the total always twice what it should be?" if the asker were doing something like this:

Console.WriteLine(statement.Count()); // this enumerates all the elements!
foreach (var item in statement) { Console.WriteLine(item.Total); }

Because the only way to get the number of elements in the sequence is to actually evaluate all of them.

Similarly, what actually happened in this question was that somewhere there was code like this:

if (statement.Any()) // this actually involves getting the first result
{ 
    // do something with the statement
}
// ...
foreach (var item in statement) { Console.WriteLine(item.Total); }

It seems innocuous, but if you know how LINQ and IEnumerable work, you know that .Any() is basically the same as .GetEnumerator().MoveNext(), which makes it more obvious that it requires getting the first element.

It all boils down to the fact that LINQ is based on deferred execution, which is why the solution is to use ToList, which circumvents that and forces immediate execution.

Community
  • 1
  • 1
ahruss
  • 2,070
  • 16
  • 21
  • Brilliant question and brilliant answer (+1), but (for my sake) could you explain how evaluating the first element would cause the `Select` to fire (including the increment of `runningTotal`) if `Any()` was called independently? Struggling to understand this one, and I'd like to avoid it in the future. –  Jul 30 '14 at 13:27
  • Deferred execution is the root cause then? Perhaps that should be part of the answer? – Mare Infinitus Jul 30 '14 at 13:41
  • 1
    @DeeMac If by "independently" you mean as a separate query of the same source (`sage.Repository`), then it won't fire. But `statement.Any()` would not be independent because `statement` is the select query itself -- any request to enumerate it will be fulfilled by executing the selector. – nmclean Jul 30 '14 at 13:52
  • Ohhh I see. I didn't realise Linq worked in this way. I didn't realise that to define a `Select` meant this new 'definition' would be used elsewhere. –  Jul 30 '14 at 14:09
  • @DeeMac I expanded my answer to explain some more. – ahruss Jul 30 '14 at 14:24
  • @ahruss Your answer is now way better! Think this is valuable to many people! But the "solution" with `ToList` is somewhat tongue-in-cheek – Mare Infinitus Jul 30 '14 at 14:48
  • @MareInfinitus `ToList` is probably the practical solution for the OP's situation, but you're right that there is a disadvantage in losing the possibility of deferred execution. I added an answer showing the alternative. – nmclean Jul 31 '14 at 12:31
  • @ahruss foreach over a LINQ expression has some very nice effects due to deferred execution. One should be careful with this. (sorry, can't find an example quick enough now where a foreach prints all identical values for the items) – Mare Infinitus Jul 31 '14 at 13:00
  • @ahruss Here we go: http://stackoverflow.com/questions/14907987/access-to-foreach-variable-in-closure Think this is exactly what your code does, see second example in the accepted answer with 5, 5, 5, 5, 5 – Mare Infinitus Jul 31 '14 at 13:20
  • @MareInfinitus That doesn't apply here... There is no closure in a foreach – ahruss Jul 31 '14 at 13:29
  • @MareInfinitus `foreach` over a query is no problem; the opposite is the problem: a query that refers to a `foreach` variable. – nmclean Jul 31 '14 at 13:42
1

If you don't want to freeze the results with ToList, a solution to the outer scope variable problem is using an iterator function, like this:

IEnumerable<StatementModel> GetStatement(IEnumerable<DataObject> source) {
    decimal runningTotal = 0;
    foreach (var x in source) {
        yield return new StatementModel() {
            ...
            RunningTotal = (runningTotal += x.GoodsValueInAccountCurrency)
        };
    }
}

Then pass to this function the source query (not including the Select):

var statement = GetStatement(sage.Repository...AsEnumerable());

Now it is safe to enumerate statement multiple times. Basically, this creates an enumerable that re-executes this entire block on each enumeration, as opposed to executing a selector (which equates to only the foreach part) -- so runningTotal will be reset.

nmclean
  • 7,564
  • 2
  • 28
  • 37