1

I just saw this bit of code that has a count++ side-effect in the .GroupBy predicate. (originally here).

object[,] data; // This contains all the data.
int count = 0;
List<string[]> dataList = data.Cast<string>()
                          .GroupBy(x => count++ / data.GetLength(1))
                          .Select(g => g.ToArray())
                          .ToList();

This terrifies me because I have no idea how many times the implementation will invoke the key selector function. And I also don't know if the function is guaranteed to be applied to each item in order. I realize that, in practice, the implementation may very well just call the function once per item in order, but I never assumed that as being guaranteed, so I'm paranoid about depending on that behaviour -- especially given what may happen on other platforms, other future implementations, or after translation or deferred execution by other LINQ providers.

As it pertains to a side-effect in the predicate, are we offered some kind of written guarantee, in terms of a LINQ specification or something, as to how many times the key selector function will be invoked, and in what order?

Please, before you mark this question as a duplicate, I am looking for a citation of documentation or specification that says one way or the other whether this is undefined behaviour or not.


For what it's worth, I would have written this kind of query the long way, by first performing a select query with a predicate that takes an index, then creating an anonymous object that includes the index and the original data, then grouping by that index, and finally selecting the original data out of the anonymous object. That seems more like a correct way of doing functional programming. And it also seems more like something that could be translated to a server-side query. The side-effect in the predicate just seems wrong to me - and against the principles of both LINQ and functional programming, so I would assume there would be no guarantee specified and that this may very well be undefined behaviour. Is it?

I realize this question may be difficult to answer if the documentation and LINQ specification actually says nothing regarding side effects in predicates. I want to know specifically whether:

  1. Specs say it's permissible and how. (I doubt it)
  2. Specs say it's undefined behaviour (I suspect this is true and am looking for a citation)
  3. Specs say nothing. (Sloppy spec, if you ask me, but it would be nice to know if others have searched for language regarding side-effects and also come up empty. Just because I can't find it doesn't mean it doesn't exist.)
Wyck
  • 10,311
  • 6
  • 39
  • 60
  • 2
    AFAIK, the answer is "no; so don't do that" – Marc Gravell Sep 18 '19 at 14:26
  • Possible duplicate of [Is it bad practice to purposely rely on Linq Side Effects?](https://stackoverflow.com/questions/32000607/is-it-bad-practice-to-purposely-rely-on-linq-side-effects) – Sinatr Sep 18 '19 at 14:29
  • @Sinatr, I'm calling for a citation of documentation. I understand it's bad practice. I understand what the implementation does. I'm looking for the specification, which I cannot find. This is me saying that just because I haven't found any text that specifies the behaviour doesn't mean it doesn't exist. I'm wondering if someone can show me where it's written that either it's undefined behaviour, or it's permissible. This is a _citation needed_ request. – Wyck Sep 18 '19 at 14:44
  • @Wyck the answer to that would be "all of them". Is there a practical purpose to this or are you trying to convince someone that insists that nothing explicitly forbids side-effects? Because *nothing* does, but there are no guarantees either. LINQ is just the language. In LINQ to objects the source shows that the selector is called only once BUT a far more important consideration would be *how does GroupBy run?* Is it a greedy algorithm, ie it consume the entire source before producing results or not? – Panagiotis Kanavos Sep 18 '19 at 14:49
  • In PLINQ, even working on an IEnumerable source, you'd have no guarantee on the order the selector is called or even *which* thread would call it. In LINQ to Objects you can assume the source's order but what source would that be? LINQ operators can return custom enumerators and often do for performance reasons. – Panagiotis Kanavos Sep 18 '19 at 14:52
  • 2
    I'm not sure if this fits your case, but look here [Page 203](https://www.ecma-international.org/publications/files/ECMA-ST/ECMA-334.pdf): _"The C# language does not specify the execution semantics of query expressions"_ – FCin Sep 18 '19 at 14:54
  • @FCin That's the kind of definitive thing I was looking for. Thanks. If you submit an answer, I will mark it as such. – Wyck Sep 18 '19 at 14:59

2 Answers2

1

According to official C# Language Specification, on page 203, we can read (emphasis mine):

12.17.3.1 The C# language does not specify the execution semantics of query expressions. Rather, query expressions are translated into invocations of methods that adhere to the query-expression pattern (§12.17.4). Specifically, query expressions are translated into invocations of methods named Where, Select, SelectMany, Join, GroupJoin, OrderBy, OrderByDescending, ThenBy, ThenByDescending, GroupBy, and Cast. These methods are expected to have particular signatures and return types, as described in §12.17.4. These methods may be instance methods of the object being queried or extension methods that are external to the object. These methods implement the actual execution of the query.

FCin
  • 3,804
  • 4
  • 20
  • 49
0

From looking at the source code of GroupBy in corefx on GitHub, it does seems like the key selector function is indeed called once per element, and it is called in the order that the previous IEnumerable provides them. I would in no way consider this a guarantee though.

In my view, any IEnumerables which cannot be enumerated multiple times safely are a big red flag that you may want to reconsider your design choices. An interesting issue that could arise from this is that for example if you view the contents of this IEnumerable in the Visual Studio debugger, it will probably break your code, since it would cause the count variable to go up.

The reason this code hasn't exploded up until now is probably because the IEnumerable is never stored anywhere, since .ToList is called right away. Therefore there is no risk of multiple enumerations (again, with the caveat about viewing it in the debugger and so on).

Daniel Crha
  • 675
  • 5
  • 13
  • I appreciate you delving into the implementation, but this is not a question of implementation. This is a question of specification - requirements placed on all implementations. – Wyck Sep 18 '19 at 14:46
  • So out of curiosity, I peeked into the C# language spec. While LINQ is a part of the spec, it's merely the part that converts query syntax into method syntax. MSDN does mention that element groupings are emitted in the order that their first elements appear in the source, and that within a groupings the same ordering holds. This is still not part of any binding specification I know of though, so it could as well change between .NET versions. – Daniel Crha Sep 18 '19 at 15:09