Is it bad practice to purposely rely on Linq Side Effects?

Question

A programming pattern like this comes up every so often:

int staleCount = 0;

fileUpdatesGridView.DataSource = MultiMerger.TargetIds
    .Select(id =>
    {
        FileDatabaseMerger merger = MultiMerger.GetMerger(id);

        if (merger.TargetIsStale)
            staleCount++;

        return new
        {
            Id = id,
            IsStale = merger.TargetIsStale,
            // ...
        };
    })
    .ToList();

fileUpdatesGridView.DataBind();
fileUpdatesMergeButton.Enabled = staleCount > 0;

I'm not sure there is a more succinct way to code this?

Even if so, is it bad practice to do this?

You are consuming the enumeration, assuring it has executed and will not execute again - so as long as you reset `staleCount` before doing so, there is nothing wrong with it. — SimpleVar, Aug 14 '15 at 00:40
The side effect is modifying staleCount which is "outside" the linq pipeline and doing so in a Select() rather than a Foreach(). — crokusek, Aug 14 '15 at 00:43
This is example/stub/hypothetical code and would be closed as such if posted to [Code Review](http://codereview.stackexchange.com/help). [Code Review](http://codereview.stackexchange.com/help) expects concrete, working code. I recommend you peruse the [help center](http://codereview.stackexchange.com/help) before making more recommendations, @SamAxe. — nhgrif, Aug 14 '15 at 00:44
It will all go rotten when you throw an `.AsParallel()` into the mix. — spender, Aug 14 '15 at 00:44
I'm voting to close this question as off-topic because it is not a specific programming question, but instead a poll. — nhgrif, Aug 14 '15 at 00:46
So tons of this code starts to brew for years because we have to go search somebody's personal blog to gauge a sentiment? Does it just need re-wording? — crokusek, Aug 14 '15 at 00:49
@crokusek If you wrote some actual concrete code and were interested in some feedback on it, you could potentially wrangle this into an on-topic [Code Review](http://codereview.stackexchange.com/help) question. But Stack Overflow isn't a good fit for this sort of question. That's not what Stack Overflow is about. And we're not really here to recommend you where to get the information you seek. Just saying that this question, as-is, isn't a fit for Stack Overflow's format. — nhgrif, Aug 14 '15 at 00:54
If I say "is it bad practice" instead of a "scale of 1 to 10" then would it be okay? — crokusek, Aug 14 '15 at 01:01
@crokusek probably no, it is still completely opinion based. Unless particular style of code is completely unacceptable (not your case) such questions rarely can be answered with definite statement. You can easily turn that question into SO-compatible one if you actually have particular goal: "how to avoid side effects in ..." or "is this code safe to make parallel"/"make parallel friendly". — Alexei Levenkov, Aug 14 '15 at 01:12
I really want to know whether I should be avoiding side effects. Maybe I just need to add a [comic strip](http://stackoverflow.com/questions/11906056/goto-is-this-bad) as this similar case lives on. There are other "is goto bad practice" cases that were closed but at least they got some nice answers before they were closed. — crokusek, Aug 14 '15 at 01:19
Recent controversy and why this is an important question: [Ensure the selector gets run during Count](https://github.com/dotnet/corefx/pull/14435) — crokusek, May 26 '17 at 02:15

Alexei Levenkov · Accepted Answer · 2015-08-14T02:17:01.620

No, it is not strictly "bad practice" (like constructing SQL queries with string concatenation of user input or using goto).

Sometimes such code is more readable than several queries/foreach or no-side-effect Aggregate call. Also it is good idea to at least try to write foreach and no-side-effect versions to see which one is more readable/easier to prove correctness.

Please note that:

it is frequently very hard to reason what/when will happen with such code. I.e. you sample hacks around the fact of LINQ queries executed lazily with .ToList() call, otherwise that value will not be computed.
pure functions can be run in parallel, once with side effects need a lot of care to do so
if you ever need to convert LINQ-to-Object to LINQ-to-SQL you have to rewrite such queries
generally LINQ queries favor functional programming style without side-effects (and hence by convention readers would not expect side-effects in the code).

I would say that this is terrible practice because the LINQ implementation does not guarantee that the enumeration behavior won't be changed or optimised in the future. While .ToList() will probably continue to force enumeration of all elements, .NET Core is already optimising .Count() and .Skip(). These will already reduce the number of elements iterated from the original collection and introduce a breaking change if side effects are relied upon. — Ryan, May 18 '21 at 05:06

Robert McKee · Answer 2 · 2015-08-14T15:23:55.460

Why not just code it like this:

var result=MultiMerger.TargetIds
    .Select(id =>
    {
        FileDatabaseMerger merger = MultiMerger.GetMerger(id);

        return new
        {
            Id = id,
            IsStale = merger.TargetIsStale,
            // ...
        };
    })
    .ToList();
fileUpdatesGridView.DataSource = result;
fileUpdatesGridView.DataBind();
fileUpdatesMergeButton.Enabled = result.Any(r=>r.IsStale);

I would consider this a bad practice. You are making the assumption that the lambda expression is being forced to execute because you called ToList. That's an implementation detail of the current version of ToList. What if ToList in .NET 7.x is changed to return an object that semi-lazily converts the IQueryable? What if it's changed to run the lambda in parallel? All of a sudden you have concurrency issues on your staleCount. As far as I know, both of those are possibilities which would break your code because of bad assumptions your code is making.

Now as far as repeatedly calling MultiMerger.GetMerger with a single id, that really should be reworked to be a join as the logic for doing a join (w|c)ould be much more efficient than what you have coded there and would scale a lot better, especially if the implementation of MultiMerger is actually pulling data from a database (or might be changed to do so).

As far as calling ToList() before passing it to the Datasource, if the Datasource doesn't use all the fields in your new object, you would be (much) faster and take less memory to skip the ToList and let the datasource only pull the fields it needs. What you've done is highly couple the data to the exact requirements of the view, which should be avoided where possible. An example would be what if you all of a sudden need to display a field that exists in FileDatabaseMerger, but isn't in your current anonymous object? Now you have to make changes to both the controller and view to add it, where if you just passed in an IQueryable, you would only have to change the view. Again, faster, less memory, more flexible, and more maintainable.

Hope this helps.. And this question really should be posted of code review, not stackoverflow.

Update on further review, the following code would be much better:

var result=MultiMerger.GetMergersByIds(MultiMerger.TargetIds);
fileUpdatesGridView.DataSource = result;
fileUpdatesGridView.DataBind();
fileUpdatesMergeButton.Enabled = result.Any(r=>r.TargetIsStale);

or

var result=MultiMerger.GetMergers().Where(m=>MultiMerger.TargetIds.Contains(m.Id));
fileUpdatesGridView.DataSource = result;
fileUpdatesGridView.DataBind();
fileUpdatesMergeButton.Enabled = result.Any(r=>r.TargetIsStale);

Any() includes the records that are non-stale so this wouldn't work. — crokusek, Aug 14 '15 at 00:52
Although, it looks like you should be using a left join rather than your function. — Robert McKee, Aug 14 '15 at 00:54
It is a bad practice, yes. If you ran your code through a good code analyzer it would likely flag it too for a number of reasons... The select is run lazily, but then you force it to run by using `ToList`. I wouldn't call `ToList` either before passing it into the datasource. Let it get the query, in case it wants to do things like sorting and paging, etc. You've defeated the laziness, you've implemented a left join manually (likely not as efficient), and then you have the variable that is calculated in a lazy function and using it outside the query scope. — Robert McKee, Aug 14 '15 at 01:20
Yes, aware of the delayed exec but have other reasons. Wouldn't your example break without the ToList() because it would need to be executed twice? — crokusek, Aug 14 '15 at 01:31
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/86941/discussion-between-crokusek-and-robert-mckee). — crokusek, Aug 14 '15 at 01:43
Based on the headline of the question, it is clear that the "programming pattern" of using Linq side effects in general need not include use of a DataSource. I apologize for providing an example that invites critique diverging from the headline. But thanks for not letting issues slide by. — crokusek, Aug 14 '15 at 17:51
The Linq will be executed twice with your last two examples (DataBind + Any()) since there no materialization with ToList(). There is no GetMergers() or GetMergersByIds(). Even if they were created, you've assumed the ID is available in the merger itself, it is not. You've assumed there is a view with column mapping, there is not. The 1st example was great and focused on the headline of the question. — crokusek, Aug 14 '15 at 18:04
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/87042/discussion-between-robert-mckee-and-crokusek). — Robert McKee, Aug 14 '15 at 21:31

Is it bad practice to purposely rely on Linq Side Effects?

2 Answers2

Linked