2

As a programming exercise I'm trying to build a function in Haskell where given a list it splits the list whenever an element is repeated. [1,2,3,3,4,5] would split into [[1,2,3],[3,4,5]] for example. My first idea was to split the list into a list of lists with single elements, where [1,2,3,3,4,5] would become [[1],[2],[3],[3],[4],[5]] and then merge lists only when the elements being compared are not equal, but implementing this has been a headache for me as I'm very new to Haskell and recursion has always given me trouble. I think something is wrong with the function I'm using to combine the lists, it will only ever return a list where all the elements that were broken apart are combined, where [1,2,3,3,4,5] becomes [[1],[2],[3],[3],[4],[5]] but my split_help function will transform this into [[1,2,3,3,4,5]] instead of [[1,2,3],[3,4,5]]

I've pasted my incomplete code below, it doesn't work right now but it should give the general idea of what I'm trying to accomplish. Any feedback on general Haskell code etiquette would also be welcome.

split_breaker breaks the list into a list of list and split_help is what I'm trying to use to combine unequal elements.

split_help x y
     | x /= y = x ++ y
     | otherwise = []

split_breaker :: Eq a => [a] -> [[a]]
split_breaker [] = []
split_breaker [x] = [[x]]
split_breaker (x:xs) = [x]:split_breaker xs

split_at_duplicate :: Eq a => [a] -> [[a]]
split_at_duplicate [x] = [[x]]
split_at_duplicate (x:xs) = foldl1 (split_help) (split_breaker [xs])
  • 2
    @Stef It does not work, unfortunately, and not just because the typo in `/=`. The issue is that `groupBy f [x1,x2,x3,x4]` will check `f x1 x2, f x1 x3, f x1 x4, ...` to determine the first group; it does not compare the last two as wanted. (Pedantically, I guess the library assumes an equivalence and could implement this differently -- anyway there's no guarantee) – chi Feb 11 '22 at 18:58
  • @chi You're right, but I'm a bit puzzled now - one of the examples given on the documentation page is with `<=`, which is not an equivalence relation: https://hackage.haskell.org/package/groupBy-0.1.0.0/docs/Data-List-GroupBy.html – Stef Feb 12 '22 at 12:07
  • @Stef This looks as a bug to me -- I just tried that example and the result was different. So, either the docs is correct and the implementation is buggy (in which case your solution would also work!), or vice versa. Interesting find. – chi Feb 12 '22 at 12:27
  • @chi So apparently there are two versions of `groupBy`, and the documentation I linked is about "This module provides an alternative definition for groupBy which does not require a transitive equivalence predicate." – Stef Feb 12 '22 at 13:40
  • @Stef Ah! That explains it, then. I thought that was the standard implementation. – chi Feb 12 '22 at 13:59

2 Answers2

2

Do you want to work it something like this?

splitAtDup [1,2,3,3,3,4,4,5,5,5,5,6]
[[1,2,3],[3],[3,4],[4,5],[5],[5],[5,6]]

Am I right?

Then do it simple:

splitAtDup :: Eq a => [a] -> [[a]]
splitAtDup (x : y : xs) | x == y = [x] : splitAtDup (y : xs)
splitAtDup (x : xs) =
  case splitAtDup xs of
    x' : xs' -> (x : x') : xs'
    _ -> [[x]]
splitAtDup [] = []
Stefan Dorn
  • 569
  • 2
  • 13
  • 1
    Thanks! Looks like this works. I think I kind of understand what the case of is doing, but could you elaborate on it? I'm confused in particular about what the underscore in _ -> [[x]] represents. – 2004DodgeNeon Feb 11 '22 at 21:19
  • This doesn't yield any of a sublist until the whole sublist is done. For example, `splitAtDup (1:2:3:3:4:5:undefined)` is [1,2,3]:⊥, but it could be [1,2,3]:(3:4:5:⊥):⊥. – Joseph Sible-Reinstate Monica Feb 12 '22 at 06:12
2

Here's a maximally lazy approach:

splitWhen :: (a -> a -> Bool) -> [a] -> [[a]]
splitWhen f = foldr go [[]] where
  go x acc = (x:xs):xss where
    xs:xss = case acc of
      (z:_):_ | f x z -> []:acc
      _ -> acc

splitAtDup :: Eq a => [a] -> [[a]]
splitAtDup = splitWhen (==)

To verify the laziness, try this:

take 2 $ take 4 <$> splitAtDup (1:2:3:3:4:5:6:undefined)

It can be fully evaluated to normal form as [[1,2,3],[3,4,5,6]].