13

I'm interested in writing an efficient Haskell function triangularize :: [a] -> [[a]] that takes a (perhaps infinite) list and "triangularizes" it into a list of lists. For example, triangularize [1..19] should return

[[1,  3,  6,  10, 15]
,[2,  5,  9,  14]
,[4,  8,  13, 19]
,[7,  12, 18]
,[11, 17]
,[16]]

By efficient, I mean that I want it to run in O(n) time where n is the length of the list.


Note that this is quite easy to do in a language like Python, because appending to the end of a list (array) is a constant time operation. A very imperative Python function which accomplishes this is:

def triangularize(elements):
    row_index = 0
    column_index = 0
    diagonal_array = []
    for a in elements:
        if row_index == len(diagonal_array):
            diagonal_array.append([a])
        else:
            diagonal_array[row_index].append(a)
        if row_index == 0:
            (row_index, column_index) = (column_index + 1, 0)
        else:
            row_index -= 1
            column_index += 1
    return diagonal_array

This came up because I have been using Haskell to write some "tabl" sequences in the On-Line Encyclopedia of Integer Sequences (OEIS), and I want to be able to transform an ordinary (1-dimensional) sequence into a (2-dimensional) sequence of sequences in exactly this way.

Perhaps there's some clever (or not-so-clever) way to foldr over the input list, but I haven't been able to sort it out.

Peter Kagey
  • 293
  • 2
  • 7
  • Does this answer your question? [Getting all the diagonals of a matrix in Haskell](https://stackoverflow.com/questions/32465776/getting-all-the-diagonals-of-a-matrix-in-haskell) – MikaelF Apr 17 '20 at 02:25
  • 1
    @MikaelF I don't think so. In particular, that assumes that for input you have a matrix, not a (potentially infinite) list. – Joseph Sible-Reinstate Monica Apr 17 '20 at 02:26
  • @JosephSible-ReinstateMonica I see, you're right. – MikaelF Apr 17 '20 at 02:28
  • More idiomatic than `foldr` you may do like `unfoldr (Just . combWith comb)` for infinite lists. Alas as i have mentioned under my answer `combWith` is O(n) thus accepted answer using `splitAt` is significantly more efficient. – Redu Apr 17 '20 at 16:35

3 Answers3

13

Make increasing size chunks:

chunks :: [a] -> [[a]]
chunks = go 0 where
    go n [] = []
    go n as = b : go (n+1) e where (b,e) = splitAt n as

Then just transpose twice:

diagonalize :: [a] -> [[a]]
diagonalize = transpose . transpose . chunks

Try it in ghci:

> diagonalize [1..19]
[[1,3,6,10,15],[2,5,9,14],[4,8,13,19],[7,12,18],[11,17],[16]]
Daniel Wagner
  • 145,880
  • 9
  • 220
  • 380
  • 2
    Hm. Well, it occurs to me that I'm not super confident `transpose` is O(n). I'm also not super confident it's not -- its implementation is sort of complicated! – Daniel Wagner Apr 17 '20 at 02:38
  • 1
    Do you think a variant of this could work on infinite lists? I'm genuinely curious. – MikaelF Apr 17 '20 at 02:41
  • This is spiritually what I'm going for in any case—thanks! – Peter Kagey Apr 17 '20 at 02:41
  • @MikaelF, this appears to work just fine on infinite lists. – Peter Kagey Apr 17 '20 at 02:42
  • @PeterKagey `take 10 $ diagonalize [1..]` doesn't seem so lazy... edit: I realized that I just confused "being lazy" with "working on infinite lists", but unless an endless loop is an option, in this case I think they're pretty much the same. – MikaelF Apr 17 '20 at 02:42
  • 1
    @MikaelF Looks right to me...? `take 3 . map (take 3) . diagonalize $ [1..]` gives `[[1,3,6],[2,5,9],[4,8,13]]`, which seems fine. – Daniel Wagner Apr 17 '20 at 02:45
  • 1
    That's because the first list in the list is itself infinite. `take 10 $ map (take 10) $ diagonalize [1..]` indeed gives the first ten elements of the first ten rows. – Peter Kagey Apr 17 '20 at 02:45
  • Can you do this more efficiently by breaking up into reversed chunks? – dfeuer Apr 17 '20 at 02:57
  • @MikaelF How long should the first chunk of the diagonalized infinite list be? – Davislor Apr 17 '20 at 03:30
  • @Davislor As PeterKagey pointed out, it's infinite, that's what tripped me up. – MikaelF Apr 17 '20 at 03:32
  • @MikaelF Right. I’m sure you can see how to diagonlize `[[1],[2,3],[4,5,6],...]`, and otherwise, I’m not sure what operation you mean. – Davislor Apr 17 '20 at 03:35
  • 1
    @Davislor I was initially under the impression that `diagonalize` could account for how many lists you took. For example, the first list of `take 5 $ superLazy [1..]` could just be 5 long, and the function could return straight away. But now I realize that this would mean that `((!! 5) . take 5 . superLazy $ [1..]) /= ((!! 5) . take 6 . superLazy $ [1..])`, and therefore it can't be lazy in that way. – MikaelF Apr 17 '20 at 03:42
  • 4
    This solution is fantastic. I built a solution using a lazy trie of integers and it pales in comparison to this, performance wise. Empirical measurements indicate that this is very close to linear time, also. I don't understand how... – luqui Apr 17 '20 at 23:34
  • Reading [this answer](https://stackoverflow.com/a/45044296/4543207) of your's i can see why `tail` is O(1) and `init` is O(n). What i can not understand is why `splitAt` is seemingly not O(n) in this code. Does `right` in `(left, right)` possibly being O(1) have an effect on O(nlogn) overall performance of this code.? – Redu Apr 18 '20 at 09:11
  • @Redu `splitAt` is O(n) in this code, but here `n` is the first argument to `splitAt`, not the length of the second list argument. All told, when running `chunk as`, the sum of all the `n`s we pass to `splitAt` is the length of `as`, so `chunk as` runs in O(`length as`). – Daniel Wagner Apr 18 '20 at 14:18
6

This appears to be directly related to the set theory argument proving that the set of integer pairs are in one-to-one correspondence with the set of integers (denumerable). The argument involves a so-called Cantor pairing function.

So, out of curiosity, let's see if we can get a diagonalize function that way. Define the infinite list of Cantor pairs recursively in Haskell:

auxCantorPairList :: (Integer, Integer) -> [(Integer, Integer)]
auxCantorPairList (x,y) =
    let nextPair = if (x > 0) then (x-1,y+1) else (x+y+1, 0)
    in (x,y) : auxCantorPairList nextPair

cantorPairList :: [(Integer, Integer)]
cantorPairList = auxCantorPairList (0,0)

And try that inside ghci:

 λ> take 15 cantorPairList
[(0,0),(1,0),(0,1),(2,0),(1,1),(0,2),(3,0),(2,1),(1,2),(0,3),(4,0),(3,1),(2,2),(1,3),(0,4)]
 λ> 

We can number the pairs, and for example extract the numbers for those pairs which have a zero x coordinate:

 λ> 
 λ> xs = [1..]
 λ> take 5 $ map fst $ filter (\(n,(x,y)) -> (x==0)) $ zip xs cantorPairList
[1,3,6,10,15]
 λ> 

We recognize this is the top row from the OP's result in the text of the question. Similarly for the next two rows:

 λ> 
 λ> makeRow xs row = map fst $ filter (\(n,(x,y)) -> (x==row)) $ zip xs cantorPairList
 λ> take 5 $ makeRow xs 1
[2,5,9,14,20]
 λ> 
 λ> take 5 $ makeRow xs 2
[4,8,13,19,26]
 λ> 

From there, we can write our first draft of a diagonalize function:

 λ> 
 λ> printAsLines xs = mapM_ (putStrLn . show) xs
 λ> diagonalize xs = takeWhile (not . null) $ map (makeRow xs) [0..]
 λ> 
 λ> printAsLines $ diagonalize [1..19]
[1,3,6,10,15]
[2,5,9,14]
[4,8,13,19]
[7,12,18]
[11,17]
[16]
 λ> 

EDIT: performance update

For a list of 1 million items, the runtime is 18 sec, and 145 seconds for 4 millions items. As mentioned by Redu, this seems like O(n√n) complexity.

Distributing the pairs among the various target sublists is inefficient, as most filter operations fail.

To improve performance, we can use a Data.Map structure for the target sublists.


{-#  LANGUAGE  ExplicitForAll       #-}
{-#  LANGUAGE  ScopedTypeVariables  #-}

import qualified  Data.List  as  L
import qualified  Data.Map   as  M

type MIL a = M.Map Integer [a]

buildCantorMap :: forall a.  [a] -> MIL a
buildCantorMap xs = 
    let   ts     =  zip xs cantorPairList -- triplets (a,(x,y))
          m0     = (M.fromList [])::MIL a
          redOp m (n,(x,y)) = let  afn as = case as of
                                              Nothing  -> Just [n]
                                              Just jas -> Just (n:jas)
                              in   M.alter afn x m
          m1r = L.foldl' redOp m0 ts
    in
          fmap reverse m1r

diagonalize :: [a] -> [[a]]
diagonalize xs = let  cm = buildCantorMap xs
                 in   map snd $ M.toAscList cm


With that second version, performance appears to be much better: 568 msec for the 1 million items list, 2669 msec for the 4 millions item list. So it is close to the O(n*Log(n)) complexity we could have hoped for.

jpmarinier
  • 4,427
  • 1
  • 10
  • 23
3

It might be a good idea to craete a comb filter.

So what does comb filter do..? It's like splitAt but instead of splitting at a single index it sort of zips the given infinite list with the given comb to separate the items coressponding to True and False in the comb. Such that;

comb :: [Bool]  -- yields [True,False,True,False,False,True,False,False,False,True...]
comb = iterate (False:) [True] >>= id

combWith :: [Bool] -> [a] -> ([a],[a])
combWith _ []          = ([],[])
combWith (c:cs) (x:xs) = let (f,s) = combWith cs xs
                         in if c then (x:f,s) else (f,x:s)

λ> combWith comb [1..19]
([1,3,6,10,15],[2,4,5,7,8,9,11,12,13,14,16,17,18,19])

Now all we need to do is to comb our infinite list and take the fst as the first row and carry on combing the snd with the same comb.

Lets do it;

diags :: [a] -> [[a]]
diags [] = []
diags xs = let (h,t) = combWith comb xs
           in h : diags t

λ> diags [1..19]
[ [1,3,6,10,15]
, [2,5,9,14]
, [4,8,13,19]
, [7,12,18]
, [11,17]
, [16]
]

also seems to be lazy too :)

λ> take 5 . map (take 5) $ diags [1..]
[ [1,3,6,10,15]
, [2,5,9,14,20]
, [4,8,13,19,26]
, [7,12,18,25,33]
, [11,17,24,32,41]
]

I think the complexity could be like O(n√n) but i can not make sure. Any ideas..?

Redu
  • 25,060
  • 6
  • 56
  • 76
  • my first naïve solution did have O(n√n) complexity as well. Using a Data.Map structure to distribute the results to the target list of lists, there is a large improvement. Details at the end of my answer. – jpmarinier Apr 17 '20 at 13:25
  • @jpmarinier In many cases it could be tricky to obtain meaningful performance metrics due to laziness but we can still get some feeling just by `:set +s`. Doing so @Daniel Wagner's accepted answer seems to be running pretty fast with the list type. Could you please check to see how it compares to your's? I was hoping to achieve similar performance but the `combWith` is nowhere as fast as `spilitAt`. – Redu Apr 17 '20 at 14:08
  • 1
    I am a bit skeptical of using ghci for performance measurements, so I use ghc -O2. As for lazyness, I print the evaluation of (sum $ map length (diagonalize input)), which gives me back the length of the input list. @Daniel Wagner's solution runs about 20% faster than the Cantor map solution, so it's definitely in the O(n*log(n)) camp. So Daniel's qualms about the nonlinearity of `transpose` seem unfounded. On top of that, it seems more lazyness friendly than the Cantor map. Well done ! – jpmarinier Apr 17 '20 at 21:28
  • @jpmarinier Checking [this answer of @Daniel Wagner](https://stackoverflow.com/a/45044296/4543207) it seems like the `snd` of the `splitAt`'s return value gets obtained in O(1) but the `fst` is still should be O(n). Somehow this reflects down to the overall performance as O(nlogn). – Redu Apr 18 '20 at 09:21
  • Yes, having just looked at the recursive definition for [splitAt](https://hackage.haskell.org/package/base-4.12.0.0/docs/src/GHC.List.html#splitAt), it seems that the (drop n xs) part is essentially obtained for free as a side effect of getting (take n xs). So Daniel is right to use `splitAt` instead of calling `drop` and `take` separately. – jpmarinier Apr 18 '20 at 15:29