While debugging a Python programme, I recently discovered that the Python itertools#groupby() function requires the input collection to be sorted, because it only groups identical elements that occur in a sequence:
Generally, the iterable needs to already be sorted on the same key function.
The operation of groupby() is similar to the uniq filter in Unix
In both cases, uniq
and Python's groupby()
, I wonder what a use case might be for applying these without sorting.
Clearly, sorting can be expensive and should be avoided whenever possible. However, if sorting is apparently inevitable in practice, then why did the Python developers decide to not make it the default in groupby()
? This seems to cause a lot of confusion among the users of the function.
I noted that this design decision does not seem to be universal. Languages like Scala seem to implicitly sort collections in their groupBy()
functions.
My question is hence: what are the use cases that led to the design decision about not implicitly sorting in uniq and Python's groupby()?