Partition a seq by a "windowing" predicate in Clojure

Question

I would like to "chunk" a seq into subseqs the same as partition-by, except that the function is not applied to each individual element, but to a range of elements.

So, for example:

(gather (fn [a b] (> (- b a) 2)) 
        [1 4 5 8 9 10 15 20 21])

would result in:

[[1] [4 5] [8 9 10] [15] [20 21]]

Likewise:

(defn f [a b] (> (- b a) 2))
(gather f [1 2 3 4]) ;; => [[1 2 3] [4]]
(gather f [1 2 3 4 5 6 7 8 9]) ;; => [[1 2 3] [4 5 6] [7 8 9]]

The idea is that I apply the start of the list and the next element to the function, and if the function returns true we partition the current head of the list up to that point into a new partition.

I've written this:

(defn gather
  [pred? lst]
  (loop [acc [] cur [] l lst]
    (let [a (first cur)
          b (first l)
          nxt (conj cur b)
          rst (rest l)]
      (cond
       (empty? l) (conj acc cur)
       (empty? cur) (recur acc nxt rst)
       ((complement pred?) a b) (recur acc nxt rst)
       :else (recur (conj acc cur) [b] rst)))))

and it works, but I know there's a simpler way. My question is:

Is there a built in function to do this where this function would be unnecessary? If not, is there a more idiomatic (or simpler) solution that I have overlooked? Something combining reduce and take-while?

Thanks.

this is the kind of problem that when I see it, I also think of a long solution, but I know that a clojure expert would have it solved in about 2 lines of code — johnbakers, Apr 22 '14 at 01:55

score 12 · Accepted Answer · edited Jun 20 '20 at 09:12

12

Original interpretation of question

We (all) seemed to have misinterpreted your question as wanting to start a new partition whenever the predicate held for consecutive elements.

Yet another, lazy, built on partition-by

(defn partition-between [pred? coll] 
  (let [switch (reductions not= true (map pred? coll (rest coll)))] 
    (map (partial map first) (partition-by second (map list coll switch)))))

(partition-between (fn [a b] (> (- b a) 2)) [1 4 5 8 9 10 15 20 21])
;=> ((1) (4 5) (8 9 10) (15) (20 21))

Actual Question

The actual question asks us to start a new partition whenever pred? holds for the beginning of the current partition and the current element. For this we can just rip off partition-by with a few tweaks to its source.

(defn gather [pred? coll]
  (lazy-seq
   (when-let [s (seq coll)]
     (let [fst (first s)
           run (cons fst (take-while #((complement pred?) fst %) (next s)))]
       (cons run (gather pred? (seq (drop (count run) s))))))))

(gather (fn [a b] (> (- b a) 2)) [1 4 5 8 9 10 15 20 21])
;=> ((1) (4 5) (8 9 10) (15) (20 21))

(gather (fn [a b] (> (- b a) 2)) [1 2 3 4])
;=> ((1 2 3) (4))

(gather (fn [a b] (> (- b a) 2)) [1 2 3 4 5 6 7 8 9])
;=> ((1 2 3) (4 5 6) (7 8 9))

edited Jun 20 '20 at 09:12

Community

1
1

answered Apr 22 '14 at 13:47

A. Webb

26,227
1
63
95

I love this one. All the magic is in `(reductions not= true (map f coll (rest coll)))`. I spent thirty minutes last night trying to remember how to do that. – galdre Apr 22 '14 at 15:10
That's pretty elegant. One issue I notice: (gather (fn [a b] (> (- b a) 2)) [1 2 3 4]) => ((1 2 3 4)) instead of ((1 2 3) (4)) – Scott Klarenbach Apr 22 '14 at 19:30
I've modified the original question to include the case where yours isn't working. – Scott Klarenbach Apr 22 '14 at 19:39
@Scott Thanks. It looks like I and, at a glance, everyone else misinterpreted your question! We seem to have all interpreted this as create a new partition whenever consecutive elements differ by more than 2, not when the element differs by more than 2 from the start of the partition. – A. Webb Apr 22 '14 at 19:43
Ya, sorry, should've included a couple more expected cases in the original quesiton. – Scott Klarenbach Apr 22 '14 at 19:45
@Scott Actual question was really clear enough, not sure why we (all) read it the other way. Anyway, actual question is a bit easier as it just takes a little tweak to the source of `partition-by`. Answer updated. – A. Webb Apr 22 '14 at 20:11
Works great, thanks! As a final last question, in clojure I thought you needed to use recur in order to avoid blowing up the jvm stack with large lists? I ask because I notice all the time code where a function calls itself directly (as is the case in your solution), which is nicer. I'm just wondering why recur isn't necessary in these instances. Or does it have something to do with the laziness of the function? If I should shutup and make that a new question just let me know ;) – Scott Klarenbach Apr 22 '14 at 20:21
@Scott See my (rather long-winded) answer to [this question](http://stackoverflow.com/q/22328304/1756702). – A. Webb Apr 22 '14 at 20:24
I am a huge Clojure fun, but can I say that this extremely elegant solution is almost unreadable unless you have a extremely high IQ? – agori Jul 24 '15 at 21:58

score 2 · Answer 2 · answered Apr 22 '14 at 01:43

Since you need to have the information from previous or next elements than the one you are currently deciding on, a partition of pairs with a reduce could do the trick in this case.

This is what I came up with after some iterations:

(defn gather [pred s]
  (->> (partition 2 1 (repeat nil) s) ; partition the sequence and if necessary
                                      ; fill the last partition with nils
    (reduce (fn [acc [x :as s]]
              (let [n   (dec (count acc))
                    acc (update-in acc [n] conj x)]
                (if (apply pred s)
                  (conj acc [])
                  acc)))
            [[]])))

(gather (fn [a b] (when (and a b) (> (- b a) 2)))
        [1 4 5 8 9 10 15 20 21])

;= [[1] [4 5] [8 9 10] [15] [20 21]]

The basic idea is to make partitions of the number of elements the predicate function takes, filling the last partition with nils if necessary. The function then reduces each partition by determining if the predicate is met, if so then the first element in the partition is added to the current group and a new group is created. Since the last partition could have been filled with nulls, the predicate has to be modified.

Tow possible improvements to this function would be to let the user:

Define the value to fill the last partition, so the reducing function could check if any of the elements in the partition is this value.
Specify the arity of the predicate, thus allowing to determine the grouping taking into account the current and the next n elements.

score 1 · Answer 3 · answered Apr 22 '14 at 02:45

I wrote this some time ago in useful:

(defn partition-between [split? coll]
  (lazy-seq
   (when-let [[x & more] (seq coll)]
     (lazy-loop [items [x], coll more]
       (if-let [[x & more] (seq coll)]
         (if (split? [(peek items) x])
           (cons items (lazy-recur [x] more))
           (lazy-recur (conj items x) more))
         [items])))))

It uses lazy-loop, which is just a way to write lazy-seq expressions that look like loop/recur, but I hope it's fairly clear.

I linked to a historical version of the function, because later I realized there's a more general function that you can use to implement partition-between, or partition-by, or indeed lots of other sequential functions. These days the implementation is much shorter, but it's less obvious what's going on if you're not familiar with the more general function I called glue:

(defn partition-between [split? coll]
  (glue conj []
        (fn [v x]
          (not (split? [(peek v) x])))
        (constantly false)
        coll))

Note that both of these solutions are lazy, which at the time I'm writing this is not true of any of the other solutions in this thread.

score 0 · Answer 4 · answered Apr 22 '14 at 02:18

Here is one way, with steps split up. It can be narrowed down to fewer statements.

(def l [1 4 5 8 9 10 15 20 21])

(defn reduce_fn [f x y]
  (cond
   (f (last (last x)) y) (conj x [y])
   :else (conj (vec (butlast x)) (conj (last x) y)) )
  )

 (def reduce_fn1 (partial reduce_fn #(> (- %2 %1) 2)))

 (reduce reduce_fn1 [[(first l)]] (rest l))

score 0 · Answer 5 · answered Apr 22 '14 at 02:42

keep-indexed is a wonderful function. Given a function f and a vector lst,

(keep-indexed (fn [idx it] (if (apply f it) idx))
       (partition 2 1 lst)))

(0 2 5 6)

this returns the indices after which you want to split. Let's increment them and tack a 0 at the front:

(cons 0 (map inc (.....)))

(0 1 3 6 7)

Partition these to get ranges:

(partition 2 1 nil (....))

((0 1) (1 3) (3 6) (6 7) (7))

Now use these to generate subvecs:

(map (partial apply subvec lst) ....)

([1] [4 5] [8 9 10] [15] [20 21])

Putting it all together:

(defn gather
  [f lst]
  (let [indices (cons 0 (map inc
                   (keep-indexed (fn [idx it]
                                    (if (apply f it) idx))
                       (partition 2 1 lst))))]
    (map (partial apply subvec (vec lst))
       (partition 2 1 nil indices))))

(gather #(> (- %2 %) 2) '(1 4 5 8 9 10 15 20 21))
([1] [4 5] [8 9 10] [15] [20 21])

If there is a faster way to get from the indices to the final solution, please feel free to edit. — galdre, Apr 22 '14 at 02:45
Thanks a lot for keep-indexed. That's a useful function also. — Scott Klarenbach, Apr 22 '14 at 20:18

Partition a seq by a "windowing" predicate in Clojure

5 Answers5

Original interpretation of question

Actual Question

Linked