6

I am wondering why the following call of groupBy does not work: My predicate is x < y, so I expect [1, 6] to be a group, but instead, Haskell put [1, 6, 4, 2] into a group.

Prelude Data.List> groupBy (\x y -> x < y) [8,5,3,2,1,6,4,2]
[[8],[5],[3],[2],[1,6,4,2]]

More strangely, when I change the last number to -2, I expect the same behavior as in the above example. That is, since both 2 and -2 are less than 4, I expect that in the result [1, 6, 4, -2] would make up a group. But instead, This time, Haskell put -2 to be a group.

Prelude Data.List> groupBy (\x y -> x < y) [8,5,3,2,1,6,4,-2]
[[8],[5],[3],[2],[1,6,4],[-2]]

Do I have a wrong understanding of groupBy?

Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
gefei
  • 18,922
  • 9
  • 50
  • 67
  • 4
    `groupBy` is intended to be used with an _equivalence relation_, specifically, in `groupBy p` you should have `p x y ≡ p y x`. But, I'm surprised that this is actually a _requirement_... – leftaroundabout Jul 31 '21 at 10:12
  • 1
    I mean, it's not a requirement. You can pass whatever function you want to `groupBy`. But you may get a surprising set of groups back if you use a surprising equality predicate. – amalloy Jul 31 '21 at 10:14
  • @leftaroundabout @amalloy Thanks. I did not know `groupBy` "requires" an equality relation. Still, it would be interesting to know why -2 is handled differently than 2 – gefei Jul 31 '21 at 10:17
  • 1
    `x` is always the *first* item of the group that is constructed. – Willem Van Onsem Jul 31 '21 at 10:18
  • Related: https://stackoverflow.com/questions/58983130/having-trouble-with-groupby-function – chi Jul 31 '21 at 15:56
  • [related](https://stackoverflow.com/questions/14403293/need-to-partition-a-list-into-lists-based-on-breaks-in-ascending-order-of-elemen/14403413#14403413). – Will Ness Aug 06 '21 at 18:55

2 Answers2

5

In the implementation of the groupBy, x is always the first item of the sublist. Indeed, groupBy is implemented as:

groupBy                 :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy _  []           =  []
groupBy eq (x:xs)       =  (x:ys) : groupBy eq zs
                           where (ys,zs) = span (eq x) xs

especially the span (eq x) is important here, since x will be the first item of a new group.

Since x is thus not the previous value in the list. If we thus run groupBy with the list [5, 3, 2, 1, 6, 4, -2], we get:

list current list x=? check with outcome
[5,3,2,1,6,4,-2] [8] 8 / /
[5,3,2,1,6,4,-2] [8] 8 5 False
[3,2,1,6,4,-2] [5] 5 / /
[3,2,1,6,4,-2] [5] 5 3 False
[3,2,1,6,4,-2] [3] 3 / /
[2,1,6,4,-2] [3] 3 2 False
[2,1,6,4,-2] [2] 2 1 False
[1,6,4,-2] [2] 2 / /
[1,6,4,-2] [2] 2 1 False
[6,4,-2] [1] 1 / /
[4,-2] [1,6] 1 6 True
[-2] [1,6,4] 1 4 True
[] [-2] -2 / /

Especially the case where we compare x=1 and y=4 is important. If x was only the previous value, we should start generating a new list, but since x is the first item of the list, that is not the case.

Normally you should only work with an equivalence relation ~ [wiki], such relation is:

  1. reflexive: so x ~ x is true;
  2. symmetric: so x ~ y if and only if y ~ x; and
  3. transitive: so x ~ y and y ~ z implies that x ~ z.

Your equivalence relation is not reflexive, nor is it symmetric. This is thus not a valid function to work with groupBy.

Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555
  • this answer would be greatly upgraded, if you said what function the OP was thinking of, the one that compares neighbouring values to put group border or not – Noone AtAll Jul 31 '21 at 20:47
2

The conceptual definition of groupBy p l is that it yields sublists of l such that for each xs in l, you have

all (==True) [p x y | x<-xs, y<-xs]

IOW, each sublist should be part of an equivalence class of p. That notion only makes sense if p is an equivalence relation. In particular, you need p x y ≡ p y x, and the defining equation also assumes that p x x is always true.

The implementation in the standard libraries shows that idea quite clearly: each x:ys list in the result has ys defined by the span of elements that are equivalent to x by the relation. So in your case, you get 1:[6,4,2], where 6,4,2 are all greater than 1.

Evidently, groupBy doesn't actually check p x y for all pairs of elements in the result lists, so this really only makes sense if p is indeed an equivalence relation.

What you expected the idea to be – and IMO this is not unreasonable – is that only for all x,y such that x is the left neighbour of y, you want p x y to hold. This is in general a weaker condition, but if p is an equivalence relation then it actually implies the original condition, because such a relation also is transitive. So maybe the implementation should actually be

groupBy' :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy' _ [] = []
groupBy' _ (x:l) = (x:xs) : zss
 where (xs,zss) = case l of
        [] -> ([],[])
        zs@(y:_)
         -> let ys:zss' = groupBy' p zs
            in if p x y then (ys, zss')
                        else ([], ys:zss')

(This could be simplified a bit, but then it wouldn't be as lazy as the old implementation.)

leftaroundabout
  • 117,950
  • 5
  • 174
  • 319