0

I'm new to Clojure. I was wondering how I could optimize an algorithm to count the number of inversions in a list. From what I understand, Clojure doesn't do tail call optimization unless specifically asked to? How do you get it to do this?

This first attempt with a mutated variable has a runtime of about 3.5s. But my second attempt was a functional version and it takes about 1m15s! and both require growing the stack size quite a bit (like -Xss12m).

How would I go about getting better performance?

I'd prefer to not have mutable variables (like the functional one) if possible. You can create the array file by typing something like seq 100000 | sort -R > IntArray.txt.

The first attempt w/ mutable variable:

(use 'clojure.java.io)

(def inversions 0)

(defn merge_and_count' [left right left_len]
  (if (empty? right) left
      (if (empty? left) right
          (if (<= (first left) (first right)) 
              (cons (first left)  (merge_and_count' (rest left) right (- left_len 1)))
              (let [_ (def inversions (+ inversions left_len))]
               (cons (first right) (merge_and_count' left (rest right) left_len)))
                  ))))

(defn inversion_count [list]
  (if (or (empty? list) (nil? (next list))) list
      (let [mid (quot (count list) 2)]
           (merge_and_count' (inversion_count (take mid list)) 
                             (inversion_count (drop mid list)) mid)
                                 )))

(defn parse-int [s]
   (Integer. (re-find  #"\d+" s )))

(defn get-lines [fname]
  (with-open [r (reader fname)]
    (doall (map parse-int (line-seq r)))))

(let [list (get-lines "IntArray.txt")
      _ (inversion_count list)]
  (print inversions))

My second attempt to be purely functional (no mutability):

(use 'clojure.java.io)


(defn merge_and_count' [left right inversions]
  (if (empty? right) (list left inversions)
      (if (empty? left) (list right inversions)
          (if (<= (first left) (first right)) 
              (let [result (merge_and_count' (rest left) right inversions)]
                   (list (cons (first left) (first result)) (second result)))
              (let [result (merge_and_count' left (rest right) (+ inversions (count left)))]
                   (list (cons (first right) (first result)) (second result)))
                       ))))

(defn inversion_count [list' list_len]
  (if (or (empty? list') (nil? (next list'))) (list list' 0)
      (let [mid (quot list_len 2)
            left (inversion_count (take mid list') mid)
            right (inversion_count (drop mid list') (- list_len mid))]
           (merge_and_count' (first left) (first right) (+ (second left) (second right)))
                                 )))

(defn parse-int [s]
   (Integer. (re-find  #"\d+" s )))

(defn get-lines [fname]
  (with-open [r (reader fname)]
    (doall (map parse-int (line-seq r)))))

(let [list (get-lines "IntArray.txt")
      result (inversion_count list 100000)]
  (print (second result)))
antimatter
  • 3,240
  • 2
  • 23
  • 34
  • What is this algorithm supposed to do? All I see is a giant pile of `first`, `rest`, and `second`. – amalloy May 11 '14 at 08:33
  • 1
    @amalloy See [here](http://www.geeksforgeeks.org/counting-inversions/), for example – Thumbnail May 11 '14 at 08:57
  • You need to grow the stack size because of the lack of TCO. Your recursive calls to merge_and_count' blow the stack. Replacing them with a "recur" call will ensure tail calls don't blow things. I'm still working out what you are actually up to:-) – pete23 May 11 '14 at 08:58
  • 1
    @pete23 his recursive calls are not in tail position, so replacing them with `recur` will replace the bad performance with a compiler error. – amalloy May 11 '14 at 08:59
  • Yes - been through that. Trying to tackle one problem at once though. Once I work out what this is actually doing... – pete23 May 11 '14 at 09:00
  • 1
    I'm assuming we're looking to solve this problem? http://stackoverflow.com/questions/337664/counting-inversions-in-an-array i.e. sum for each item in the list the # of successor entries that are < than it. The n^2 solution should be trivial. – pete23 May 11 '14 at 09:04
  • My question wasn't on how to solve it... The two examples above are working solutions. I was wondering how I can make it run faster. I'd try out recur, but @amalloy says it won't work. Is that the way you enable tail call optimization in Clojure? – antimatter May 11 '14 at 10:13
  • The algorithm counts the number of inversions in the array. So, for example, for an array [1,3,5,2,4,6], there are 3 inversions: (3,2), (5,2) and (5,4). Those are the locations in the array where an earlier index is larger than a later index (indicating that the array isn't sorted). – antimatter May 11 '14 at 10:20
  • 2
    Loop/recur isn't exactly "how you enable tail call optimisation in Clojure" - there is no tail call optimisation in Clojure. But loop/recur is an alternative that provides similar capabilities. The documentation is here: http://clojure.org/special_forms#Special%20Forms--(recur%20exprs*) – Paul Butcher May 11 '14 at 10:27
  • @gyeh There's no magic button you can press to say "please run faster". That's why I asked about the algorithm: an answer to improve performance will have to know how it works, and probably rewrite most of it. – amalloy May 11 '14 at 17:37
  • I'm not looking for a "magic button". I'm really wondering if theres an operation or two I'm using that's slowing it down, for example. Or if there's a much better approach, I'd like to know about it. You don't have to rewrite my code, I wasn't looking for that. Just an idea of how to move forward in improving the algo – antimatter May 12 '14 at 01:29

1 Answers1

3

The stack overflows due to the recursion in merge-and-count. I tried this approach, and for 100000 items, it came back instantly.

(defn merge_and_count [left right inversions]
  (loop [l left r right inv inversions result []]
    (cond (and (empty? r) (empty? l)) [result inv]
          (empty? r) [(apply conj result l) inv]
          (empty? l) [(apply conj result r) inv]
          (<= (first l) (first r)) (recur (rest l) r inv (conj result (first l)))
          :else (recur l (rest r) (+ inv (count l))  (conj result (first r))))))

You need to replace this code with code from your second approach.

grdvnl
  • 636
  • 6
  • 9
  • Thanks. So is loop/recur the de-facto way for Clojure to optimize recursion/tail recursion? Do you know why recursive calls aren't implicitly optimized in Clojure? It seems like a pretty fundamental aspect of functional programming. Does it have something to do with the JVM? – antimatter May 12 '14 at 11:58
  • 1
    There is some discussion here: http://stackoverflow.com/questions/19462314/why-cant-tail-calls-be-optimized-in-jvm-based-lisps. Also, look at Rich Hickey's comment: https://groups.google.com/forum/#!msg/clojure/4bSdsbperNE/tXdcmbiv4g0J Usually, recur lets you perform any operation that is possible in loops, but with the functional construct. This helps in thinking functionally and recursively. – grdvnl May 12 '14 at 12:03