Recursively sort non-contiguous list to list of contiguous lists

Question

I've been trying to learn a bit of functional programming (with Haskell & Erlang) lately and I'm always amazed at the succinct solutions people can come up with when they can think recursively and know the tools.

I want a function to convert a list of sorted, unique, non-contiguous integers into a list of contiguous lists, i.e:

[1,2,3,6,7,8,10,11]

to:

[[1,2,3], [6,7,8], [10,11]

This was the best I could come up with in Haskell (two functions)::

make_ranges :: [[Int]] -> [Int] -> [[Int]]
make_ranges ranges [] = ranges
make_ranges [] (x:xs)
    | null xs = [[x]]
    | otherwise = make_ranges [[x]] xs
make_ranges ranges (x:xs)
    | (last (last ranges)) + 1 == x =
        make_ranges ((init ranges) ++ [(last ranges ++ [x])]) xs
    | otherwise = make_ranges (ranges ++ [[x]]) xs

rangify :: [Int] -> [[Int]]
rangify lst = make_ranges [] lst

It might be a bit subjective but I'd be interested to see a better, more elegant, solution to this in either Erlang or Haskell (other functional languages too but I might not understand it.) Otherwise, points for just fixing my crappy beginner's Haskell style!

Wow, great answers all, I can learn something from all of them. I'll give it a little while but I'll be amazed if anyone can beat @barsoaps on the succinct, elegant front. — Mikesname, Feb 18 '11 at 01:14
It's good to see you are using pattern matching and guards; these are usually signs of more than just "crappy beginner's Haskell style". I don't know if you did this on purpose but your `make_ranges` is written in beautiful tail-call style (all non-base-case definitions of make_ranges are defined as a direct call to make_ranges). That style is a very Erlangy thing to do, but nevertheless should be applauded. — Dan Burton, Feb 18 '11 at 01:59

Dan Burton · Accepted Answer · 2011-02-19T01:07:46.547

10

Most straightforward way in my mind is a foldr:

ranges = foldr step []
    where step x [] = [[x]]
          step x acc@((y:ys):zs) | y == x + 1 = (x:y:ys):zs
                                 | otherwise  = [x]:acc

Or, more concisely:

ranges = foldr step []
    where step x ((y:ys):zs) | y == x + 1 = (x:y:ys):zs
          step x acc = [x]:acc

But wait, there's more!

abstractRanges f = foldr step []
    where step x ((y:ys):zs) | f x y = (x:y:ys):zs
          step x acc = [x]:acc

ranges = abstractRanges (\x y -> y == x + 1)
powerRanges = abstractRanges (\x y -> y == x*x) -- mighty morphin

By turning the guard function into a parameter, you can group more interesting things than just +1 sequences.

*Main> powerRanges [1,1,1,2,4,16,3,9,81,5,25]
[[1,1,1],[2,4,16],[3,9,81],[5,25]]

The utility of this particular function is questionable...but fun!

edited Feb 19 '11 at 01:07

answered Feb 18 '11 at 01:22

Dan Burton

53,238
27
117
198

I haven't seen the '@' operator before. Could you point me at some documentation or explain? – Alex M Feb 18 '11 at 18:20
2

@amccausl The @ operator is read as "as" and it is a pattern matching operator. Here's a [link](http://stackoverflow.com/questions/1153465/what-does-the-symbol-mean-in-reference-to-lists-in-haskell) to another SO page that discusses its uses – Ken Wayne VanderLinde Feb 18 '11 at 20:07

Landei · Answer 2 · 2011-02-22T16:02:29.967

I can't believe I got the shortest solution. I know this is no code golf, but I think it is still quite readable:

import GHC.Exts
range xs = map (map fst) $ groupWith snd $ zipWith (\a b -> (a, a-b)) xs [0..]

or pointfree

range = map (map snd) . groupWith fst . zipWith (\a b -> (b-a, b)) [0..]

BTW, groupWith snd can be replaced with groupBy (\a b -> snd a == snd b) if you prefer Data.List over GHC.Exts

[Edit]

BTW: Is there a nicer way to get rid of the lambda (\a b -> (b-a, b)) than (curry $ (,) <$> ((-) <$> snd <*> fst) <*> snd) ?

[Edit 2]

Yeah, I forgot (,) is a functor. So here is the obfuscated version:

range = map (map fst) . groupWith snd . (flip $ zipWith $ curry $ fmap <$> (-).fst <*> id) [0..]

Suggestions are welcome...

barsoap · Answer 3 · 2011-02-18T02:23:37.277

4

import Data.List (groupBy)

ranges xs = (map.map) snd 
          . groupBy (const fst) 
          . zip (True : zipWith ((==) . succ) xs (tail xs))
          $ xs

As to how to come up with such a thing: I started with the zipWith f xs (tail xs), which is a common idiom when you want to do something on consecutive elements of a list. Likewise is zipping up a list with information about the list, and then acting (groupBy) upon it. The rest is plumbing.

Then, of course, you can feed it through @pl and get:

import Data.List (groupBy)
import Control.Monad (ap)
import Control.Monad.Instances()

ranges = (((map.map) snd) 
           . groupBy (const fst)) 
         .) =<< zip 
         . (True:) 
         . ((zipWith ((==) . succ)) `ap` tail)

, which, by my authoritative definition, is evil due to Mondad ((->) a). Twice, even. The data flow is meandering too much to lay it out in any sensible way. zipaptail is an Aztec god, and Aztec gods aren't to be messed with.

edited Feb 18 '11 at 02:23

answered Feb 18 '11 at 00:59

barsoap

3,376
3
23
21

Can't seem to get this to compile in GHCI: `Couldn't match expected type '[Bool]' against inferred type '[a] -> [Bool]', In the second argument of '(:)', namely 'zipWith ((==) . (+ 1)) xs'.` Am I doing something thick? – Mikesname Feb 18 '11 at 01:32
Whoops. That's what happens when you replace parenthesis with `($)` without realising that messes with `(:)`, and then not even testing it. Fixed. – barsoap Feb 18 '11 at 01:40
s/`ap`/<*> and you can use the Applicative (S combinator). The latter version is much neater imo, except for ((map.map) snd) – Tony Morris Feb 18 '11 at 04:32
@Tony Morris *any* ((->) a) instance besides Functor is evil. – barsoap Feb 18 '11 at 13:10

score 4 · Answer 4 · answered Feb 18 '11 at 09:57

4

Another version in Erlang:

part(List) -> part(List,[]).
part([H1,H2|T],Acc) when H1 =:= H2 - 1 -> 
    part([H2|T],[H1|Acc]);
part([H1|T],Acc) ->
    [lists:reverse([H1|Acc]) | part(T,[])];
part([],Acc) -> Acc.

answered Feb 18 '11 at 09:57

Lukas

5,182
26
17

ephemient · Answer 5 · 2011-02-18T01:43:40.330

3

Try reusing standard functions.

import Data.List (groupBy)

rangeify :: (Num a) => [a] -> [[a]]
rangeify l = map (map fst) $ groupBy (const snd) $ zip l contigPoints
  where contigPoints = False : zipWith (==) (map (+1) l) (drop 1 l)

Or, following (mixed) advice to use unfoldr, stop abusing groupBy, and be happy using partial functions when it doesn't matter:

import Control.Arrow ((***))
import Data.List (unfoldr)

spanContig :: (Num a) => [a] -> [[a]]
spanContig l =
    map fst *** map fst $ span (\(a, b) -> a == b + 1) $ zip l (head l - 1 : l)

rangeify :: (Num a) => [a] -> [[a]]
rangeify = unfoldr $ \l -> if null l then Nothing else Just $ spanContig l

edited Feb 18 '11 at 01:43

answered Feb 18 '11 at 00:52

ephemient

198,619
38
280
391

2

One nice thing about this solution is that it uses only total functions -- i.e. it is guaranteed not to throw an exception. In general, one wants to stay away from `head`, `last`, `tail`, `init` and friends whenever possible. Additionally, this solution is written in a lazy streaming style so that it works on infinite lists as well. Functions such as `reverse`, and `last` force traversal of the entire list so can't work on infinite lists, and can lead to bad performance on long finite ones. – sclv Feb 18 '11 at 01:02
@barsoap -- beat me to it too :-) The `zipWith f xs (drop 1 xs)` idiom is such an important one to teach haskell beginners. – sclv Feb 18 '11 at 01:03
@barsoap, ephemient: Gah, people always manage to write more concise functions than I can. – Robert Massaioli Feb 18 '11 at 01:04
`tail` instead of `drop 1` works perfectly fine, the outermost zip takes care of that. – barsoap Feb 18 '11 at 01:04
The abuse of groupBy on the other hand is a sort of bad habit. It would be more proper to write it as an unfold (and probably it would fuse better as well?). But remember, by the way that `drop 1` is total and `tail` isn't, so I consider the former better almost always. – sclv Feb 18 '11 at 01:05
I came up with a second solution that uses zip `ap` tail but it is way too wordly. – Robert Massaioli Feb 18 '11 at 01:05
@sclv I just don't like numbers in list code. It just feels wrong. I wouldn't mind another, total version of `tail` at all, though. – barsoap Feb 18 '11 at 01:21

score 3 · Answer 6 · answered Feb 18 '11 at 01:42

3

k z = map (fst <$>) . groupBy (const snd) . 
        zip z . (False:) . (zipWith ((==) . succ) <*> tail) $ z

answered Feb 18 '11 at 01:42

Tony Morris

3,045
2
29
17

1

I guess the key to understand this are several pints of Australian beer :) – Landei Feb 18 '11 at 12:56
This is really just a slightly more golfed version of the first one by ephemient. – sclv Feb 18 '11 at 15:45

score 3 · Answer 7 · answered Feb 18 '11 at 14:31

3

Erlang using foldr:

ranges(List) ->
    lists:foldr(fun (X, [[Y | Ys], Acc]) when Y == X + 1 ->
                        [[X, Y | Ys], Acc];
                    (X, Acc) ->
                        [[X] | Acc]
                end, [], List).

answered Feb 18 '11 at 14:31

Krzysiek Goj

1,598
4
16
18

Note this is the same as my answer's "concise" version, but in Erlang. We think alike. :) – Dan Burton Feb 19 '11 at 01:14

score 2 · Answer 8 · answered Feb 18 '11 at 00:34

This is my v0.1 and I can probably make it better:

makeCont :: [Int] -> [[Int]]
makeCont [] = []
makeCont [a] = [[a]]
makeCont (a:b:xs) = if b - a == 1
                        then (a : head next) : tail next
                        else [a] : next
                    where
                        next :: [[Int]]
                        next = makeCont (b:xs)

And I will try and make it better. Edits coming I think.

Adam Lindberg · Answer 9 · 2011-02-18T09:57:47.100

As a comparison, here's an implementation in Erlang:

partition(L)  -> [lists:reverse(T) || T <- lists:reverse(partition(L, {[], []}))].

partition([E|L], {R, [EL|_] = T}) when E == EL + 1 -> partition(L, {R, [E|T]});
partition([E|L], {R, []})                          -> partition(L, {R, [E]});
partition([E|L], {R, T})                           -> partition(L, {[T|R], [E]});
partition([],    {R, []})                          -> R;
partition([],    {R, T})                           -> [T|R].

score 2 · Answer 10 · edited May 23 '17 at 12:30

2

It can be pretty clear and simple in the Erlang:

partition([]) -> [];
partition([A|T]) -> partition(T, [A]).

partition([A|T], [B|_]=R) when A =:= B+1 -> partition(T, [A|R]);
partition(L, P) -> [lists:reverse(P)|partition(L)].

Edit: Just for curiosity I have compared mine and Lukas's version and mine seems about 10% faster either in native either in bytecode version on testing set what I generated by lists:usort([random:uniform(1000000)||_<-lists:seq(1,1000000)]) on R14B01 64b version at mine notebook. (Testing set is 669462 long and has been partitioned to 232451 sublists.)

Edit2: Another test data lists:usort([random:uniform(1000000)||_<-lists:seq(1,10000000)]), length 999963 and 38 partitions makes bigger diference in native code. Mine version finish in less than half of time. Bytecode version is only about 20% faster.

Edit3: Some microoptimizations which provides additional performance but leads to more ugly and less maintainable code:

part4([]) -> [];
part4([A|T]) -> part4(T, A, []).

part4([A|T], B, R) when A =:= B+1 -> part4(T, A, [B|R]);
part4([A|T], B, []) -> [[B]|part4(T, A, [])];
part4([A|T], B, R) -> [lists:reverse(R, [B])|part4(T, A, [])];
part4([], B, R) -> [lists:reverse(R,[B])].

edited May 23 '17 at 12:30

Community

1
1

answered Feb 18 '11 at 14:25

Hynek -Pichi- Vychodil

26,174
5
52
73

Accumulators will always win when you do it for long lists, when doing it for lost of small lists however, using non-tail recursion can be faster, I'll have to check :) – Lukas Feb 22 '11 at 07:34
@Lukas: [Myth: Tail-recursive functions are MUCH faster than recursive functions](http://www.erlang.org/doc/efficiency_guide/myths.html#tail_recursive). I think that mine testing lists are big enough ;-) – Hynek -Pichi- Vychodil Feb 22 '11 at 10:55
My point was that my implementation might be faster for smaller lists :) but as they say in the myth, it comes down to how the compiler optimizes your code and then how that code runs on your hw... so many variables so little time – Lukas Feb 22 '11 at 20:20
@Lukas, first test uses 232451 sublists with average length 2.88. I think those are short enough. – Hynek -Pichi- Vychodil Apr 22 '11 at 08:24

score 2 · Answer 11 · answered Feb 18 '11 at 15:51

The standard paramorphism recursion scheme isn't in Haskell's Data.List module, though I think it should be. Here's a solution using a paramorphism, because you are building a list-of-lists from a list, the cons-ing is a little tricksy:

contig :: (Eq a, Num a) => [a] -> [[a]]
contig = para phi [] where
  phi x ((y:_),(a:acc)) | x + 1 == y = (x:a):acc
  phi x (_,       acc)               = [x]:acc

Paramorphism is general recursion or a fold with lookahead:

para :: (a -> ([a], b) -> b) -> b -> [a] -> b
para phi b []     = b
para phi b (x:xs) = phi x (xs, para phi b xs)

OJ. · Answer 12 · 2011-02-18T01:15:19.273

1

Here's an attempt from a haskell noob

ranges ls = let (a, r) = foldl (\(r, a@(h:t)) e -> if h + 1 == e then (r, e:a) else (a:r, [e])) ([], [head ls]) (tail ls)
            in           reverse . map reverse $ r : a

edited Feb 18 '11 at 01:15

answered Feb 18 '11 at 00:38

OJ.

28,944
5
56
71

Recursively sort non-contiguous list to list of contiguous lists

12 Answers12