4

In another question, a user suggested to write code like to that:

def list = ['a', 'b', 'c', 'd']
def i = 0; 
assert list.collect { [i++] } == [0, 1, 2, 3]

Such code is, in other languages, considered bad practice because the content of collect changes the state of it's context (here it changes the value of i). In other words, the closure has side-effects.

Such higher order functions should be able to run the closure in parallel, and assemble it in a new list again. If the processing in the closure are long, CPU intensive operations, it may be worth executing them in separate threads. It would be easy to change collect to use an ExecutorCompletionService to achieve that, but it would break the above code.

Another example of a problem is if, for some reason, collect browse the collection in, say, reverse order, in which case the result would be [3, 2, 1, 0]. Note that in this case, the list have not been reverted, 0 is really the result of applying the closure to 'd'!

Interestingly, these functions are documented with "Iterates through this collection" in Collection's JavaDoc, which suggests the iteration is sequential.

Does the groovy specification explicitly defines the order of execution in higher order functions like collect or each? Is the above code broken, or is it OK?

Antoine
  • 5,158
  • 1
  • 24
  • 37

2 Answers2

3

I don't like explicit external variables being relied upon in my closures for the reasons you give above.

Indeed, the less variables I have to define, the happier I am ;-)

For the possibly parallel things as well, always code with a view to wrapping it with some level of GPars loveliness should it prove too much for a single thread to handle. For this, as you say, you want as little mutability as possible and to try and completely avoid side-effects (such as the external counter pattern above)

As for the question itself, if we take collect as an example function, and examine the source code, we can see that given an Object (Collection and Map are done in a similar way with slight differences as to how the Iterator is referenced) it iterates along InvokerHelper.asIterator(self), adding the result of each closure call to the resultant list.

InvokerHelper.asIterator (again source is here) basically calls the iterator() method on the Object passed in.

So for Lists, etc it will iterate down the objects in the order defined by the iterator.

It is therefore possible to compose your own class which follows the Iterable interface design (doesn't need to implement Iterable though, thanks to duck-typing), and define how the collection will be iterated.

I think by asking about the Groovy specification though, this answer might not be what you want, but I don't think there is an answer. Groovy has never really had a 'complete' specification (indeed this is point about groovy that some people dislike).

tim_yates
  • 167,322
  • 27
  • 342
  • 338
1

I think keeping the functions passed collect or findAll side-effect free is a good idea in general, not only for keeping the complexity low but making the code more parallel-friendly in case parallel execution is needed in the future.

But in the case of each there is not much point in keeping the function side-effect free, as it wouldn't do anything (in fact the sole purpose of this method is to replace act as a for-each loop). The Groovy's documentation have some examples of using each (and its variants, eachWithIndex and reverseEach) that require an execution order to be defined.

Now, from a pragmatic point of view, I think it can sometimes be OK to use functions with some side effects in methods like collect. For example, to transform a list in [index, value] pairs a transpose and range can be used

def list = ['a', 'b', 'c']
def enumerated = [0..<list.size(), list].transpose()
assert enumerated == [[0,'a'], [1,'b'], [2,'c']]

Or even an inject

def enumerated = list.inject([]) { acc, val -> acc << [acc.size(), val] }

But a collect and a counter does the trick too and I think the result is the most readable:

def n = 0, enumerated = list.collect{ [n++, it] }

Now, this example wouldn't make sense if Groovy provided acollect and similar methods with a index-value-param function (see Jira issue), but it kinda shows that sometimes practicality beats purity IMO :)

Community
  • 1
  • 1
epidemian
  • 18,817
  • 3
  • 62
  • 71