Can you formulate the insertion sort as a monoid in Clojure?

Question

This is the code for an insertion sort in Clojure:

(defn in-sort! [data]
  (letfn [(insert ([raw x](insert [] raw x))
          ([sorted [y & raw] x]
             (if (nil? y) (conj sorted x)
             (if (<= x y ) (concat sorted [x,y] raw)
                 (recur (conj sorted y)  raw x )))))]   
    (reduce insert [] data)))
;Usage:(in-sort! [6,8,5,9,3,2,1,4,7])
;Returns: [1 2 3 4 5 6 7 8 9]

This is the insertion sort formulated as a monoid in Haskell:

newtype OL x = OL [x]
instance Ord x => Monoid (OL x) where
  mempty = OL []
  mappend (OL xs) (OL ys) = OL (merge xs ys) where
    merge [] ys = ys
    merge xs [] = xs
    merge xs@(x : xs') ys@(y : ys')
       | x <= y = x : merge xs' ys
       | otherwise = y : merge xs ys'

isort :: Ord x => [x] -> OL x
isort = foldMap (OL . pure)

This is how to write a monoid in Clojure:

(def  mempty (+)) ;; 0
(def  mappend +)
(defn mconcat [ms]
    (reduce mappend mempty ms))

(mappend 3 4) ;; 7

(mconcat [2 3 4]) ;; 9

My question is: Can you formulate the insertion sort as a monoid in Clojure?

That haskell code implements a merge sort, not an insertion sort. I'm not sure, but insertion sort doesn't seem to have the monoidal structure that merge sort does. — amalloy, Feb 24 '14 at 17:24
@amalloy no, it's insertion sort, because the folding tree is skewed fully to the right: `isort xs == foldr (merge . (:[])) [] xs` (i.e. left list is always singleton). — Will Ness, Feb 24 '14 at 17:54
In other words, it maps the collection to sort into a collection of singleton lists for the monoid to work on. The fold then merges the singleton lists into the accumulator. That's the same action as an insert. — A. Webb, Feb 24 '14 at 18:01

score 3 · Answer 1 · edited Feb 24 '14 at 18:34

3

Here is my attempt, might not be the best one though :)

It's quite a direct translation of the Haskell monoid. Since we don't have auto-currying in Clojure, I needed to make a special comp-2 function.

(defn comp-2 [f g]
  (fn [x y] (f (g x) (g y))))

(defn pure-list [x]
  (cond
   (sequential? x) (if (empty? x) '() (seq x))
   :else (list x)))

(def OL-mempty (list))
(defn OL-mappend [xs ys]
  (letfn [(merge [xs ys]
            (cond
             (empty? xs) ys
             (empty? ys) xs
             :else (let [[x & xs'] xs
                         [y & ys'] ys]
                     (if (<= x y) 
                       (cons x (lazy-seq (merge xs' ys)))
                       (cons y (lazy-seq (merge xs ys')))))))]
    (doall (merge xs ys))))

(defn foldmap [mempty mappend l]
  (reduce mappend mempty l))

(def i-sort (partial foldmap OL-mempty (comp-2 OL-mappend pure-list)))
(i-sort (list 5 3 4 1 2 6)) ;; (1 2 3 4 5 6)

Here is a link to a very nice paper about morphisms in sorts.

Compatibility with reducers

If we want to go with Reducers style monoid then we could embed "mempty" in our "mappend" as a zero-arity branch. Once we do that, we can make our monoid fit right away in the Reducers library:

(require '[clojure.core.reducers :as re])

(defn pure-list [x]
  (cond
   (sequential? x) (if (empty? x) '() (seq x))
   :else (list x)))

(defn sort-monoid 
  ([] '())      ;; mempty
  ([xs ys]      ;; mappend
     (letfn [(merge [xs ys]
               (cond
                (empty? xs) ys
                (empty? ys) xs
                :else (let [[x & xs'] xs
                            [y & ys'] ys]
                        (if (<= x y) 
                          (cons x (lazy-seq (merge xs' ys)))
                          (cons y (lazy-seq (merge xs ys')))))))]
       (doall (merge (pure-list xs) (pure-list ys))))))

(re/reduce sort-monoid (list 2 4 1 2 5))

edited Feb 24 '14 at 18:34

Will Ness

70,110
9
98
181

answered Feb 24 '14 at 11:44

Shlomi

4,708
1
23
32

1

This is nice. You should wrap the recursion in merge in a lazy-seq so you don't cause a stack overflow when sorting long sequences. – A. Webb Feb 24 '14 at 15:42
Actually, just that won't solve the problem, sorry. I'll have to think about it more later. – A. Webb Feb 24 '14 at 15:51
2

I've edited in the fix to save you one edit towards Community Wiki status (10), if you don't mind. Problem of possibly deep recursion was replaced by stacking of unrealized lazy sequences. I don't know enough about Haskell to know whether or not it would have the same problem. In any event, this would easily be solved by using an accumulator as this is tail recursive modulo a cons, but your version shows the parallel to the Haskell one. – A. Webb Feb 24 '14 at 16:38
@A.Webb in Haskell the guarded recursion (lazy equivalent of TRMC optimization) is usually the most efficient in these situations. – Will Ness Feb 24 '14 at 18:38
2

@WillNess Doing a lazy cons in these situations is usually not shied away from in Clojure either. This fails here in Clojure on larger inputs without some enforced strictness because the thunks become too deeply nested. If you iteratively build lazy sequences based on lazy sequences then you potentially blow the stack when forced to realize as the chain of thunks unravels. I rather suspect the same is true in Haskell based on your "beware the thunks blow-out!" comment back here http://stackoverflow.com/questions/13042353/does-haskell-have-tail-recursive-optimization. – A. Webb Feb 24 '14 at 18:54
@A.Webb wow, the *precision* of your reference! :) :) I don't know Clojure, but in Haskell this problem doesn't happen with `merge` because e.g. in `merge xs@(x : xs') ys@(y : ys') | x <= y = x : merge xs' ys` the thunk `merge xs' ys` is not explored until `x` is consumed (the thunk creation is guarded against by the lazy cons-tructor) and then it is destroyed when its head is demanded; so there's no nesting of thunks. Nesting of thunks (e.g. created by `foldl`) is the problem; chaining of guarded definitions is not. :) – Will Ness Feb 24 '14 at 19:02
@A.Webb btw I've re-written this kind of `merge` code in Scheme with explicit delay and it worked OK ([here](http://stackoverflow.com/a/12111632/849891), towards the end of the post). It goes smth like `(define (merge a b) (if ... (cons (car a) (lambda () (merge (cdr a) b))) ... ))` and so there's no nesting of thunks either. – Will Ness Feb 24 '14 at 19:08
@WillNess It is exactly the same here. A single merge is not a problem. Its the realization of the unrealized merge of an unrealized merge of an unrealized merge of an unrealized...from the fold that is. Does the Haskell version in the question blow the stack on a large shuffled input list? Do you fix it by forcing some strictness? – A. Webb Feb 24 '14 at 19:11
1

@A.Webb ah, that's embarrassing. Yes, of course, the insertion will force the right argument i.e. the recursive result; so yes, it will be a problem in Haskell too. Even though an insertion doesn't force its right arg to its end, but it demands at least one element from it, and so the whole chain will be forced to the last element of the input list. But this is sort, it must force through all the elts to produce head of result... The only solution is rearranging the tree of insertions into more balanced form to limit the depth of thunk-nesting. But then it becomes `mergesort`. – Will Ness Feb 24 '14 at 19:19
1

@A.Webb so for *insertion* sort, the `foldr` formulation is a disaster; `foldl` must be used, and exactly as you say, strictness better be enforced. – Will Ness Feb 24 '14 at 19:20

score 2 · Accepted Answer · edited Feb 24 '14 at 19:36

2

Here, for reference, is another version which turns the tail recursion modulo cons into tail recursion with an accumulator. For the sake of variety, here is also one way to partially simulate the absent type-classes.

(defprotocol Monoid 
  (mempty [_] ) 
  (mappend [_ xs ys]))

(defn fold-map
  [monoid f xs]
  (reduce (partial mappend monoid) (mempty monoid) (map f xs)))

(defn- ord-mappend* 
  [[x & rx :as xs] [y & ry :as ys] a] 
  (cond
    (empty? xs) (concat a ys)
    (empty? ys) (concat a xs)
    :else (if (< x y) 
             (recur rx ys (conj a x))
             (recur xs ry (conj a y)))))

(def Ord 
  (reify Monoid 
    (mempty [_] (list))
    (mappend [_ xs ys] (ord-mappend* xs ys []))))

(defn isort [xs] (fold-map Ord list xs))

(defn is-sorted? [xs] (apply < xs))

(is-sorted? (isort (shuffle (range 10000))))
;=> true (sometime later)

edited Feb 24 '14 at 19:36

Will Ness

70,110
9
98
181

answered Feb 24 '14 at 19:25

A. Webb

26,227
1
63
95

Could you expand on what you mean by 'one way to partially simulate the absent type-classes' – hawkeye Feb 24 '14 at 21:41
3

@hawkeye Monoid in Haskell is a typeclass, a contract for types. As you are aware, Clojure eschews a formal type system at the language level (gradually being added at the [library](https://github.com/clojure/core.typed) level). So, your series of questions - "Can you do X with typeclass Foo in Clojure?" - are somewhat awkward to answer. Often, the most truthful answer would be "Yes, but why bother when you don't have types?" Or, "That's not idiomatic for Clojure." Clojurians tend to roll their own DSLs to manage complexity as it arises rather than work in formalisms from the beginning. – A. Webb Feb 24 '14 at 22:48
1

@hawkeye Typeclasses in Haskell are extremely powerful for polymorphism. In Clojure, we take a lot of polymorphism for granted because we don't bother to specify type to begin with. But, there are protocols and multi-methods. These are very powerful, but not as powerful as Haskell's typeclasses. You can introduce something like a typeclass by specifying your contract in a protocol. This only dispatches on the first argument, whereas Haskell can dispatch on the entire signature, including return value. Multi-methods are more powerful than protocols, but not as powerful as Haskell's dispatch. – A. Webb Feb 24 '14 at 22:48
2

@hawkeye So, here I used a dummy first argument to dispatch the kind of Monoid I want and left the rest typeless. You can extend protocols to types in Clojure, but a type can be a Monoid in many different ways (e.g. [Integer, +, 0], [Integer, *, 1]). You'd need some way to distinguish which you are talking about. In Haskell, you can do this in the type signature. Here, I have you specify the desired Monoid at each use. When this becomes awkward, you can introduce a sugar macro as I did in answering one of your Monad questions. – A. Webb Feb 24 '14 at 22:49
@hawkeye This is only one possible solution. The various libraries that implement Monads, etc. in Clojure each seem to take a different approach. – A. Webb Feb 24 '14 at 22:49

Can you formulate the insertion sort as a monoid in Clojure?

2 Answers2

Compatibility with reducers

Linked