8

How can I group nested collections based on column values, which are dynamically given? For example, suppose we have the following nested collections; how can I group it by the values in first and second columns?

[ ["A" 2011 "Dan"] ["A" 2011 "Jon"] ["A" 2010 "Tim"] ["B" 2009 "Tom"] ]

The desired resulting map is:

{ A { 
      2011 [['A', 2011, 'Dan'] ['A', 2011, 'Joe']]
      2010 [['A', 2010, 'Tim']] 
    }
  B { 2009 [['B', 2009, 'Tom']] } 
}

Following is my solution, which almost works:

(defn nest [data criteria]
  (if (empty? criteria)
    data
    (for [[k v] (group-by #(nth % (-> criteria vals first)) data)]
      (hash-map k (nest v (rest criteria))))))
Ari
  • 4,121
  • 8
  • 40
  • 56

5 Answers5

7

I came up with the following:

user=> (def a [["A" 2011 "Dan"] 
               ["A" 2011 "Jon"] 
               ["A" 2010 "Tim"] 
               ["B" 2009 "Tom"] ])

user=> (into {} (for [[k v] (group-by first a)] 
                  [k (group-by second v)]))

{"A" {2011 [["A" 2011 "Dan"] 
            ["A" 2011 "Jon"]], 
      2010 [["A" 2010 "Tim"]]}, 
 "B" {2009 [["B" 2009 "Tom"]]}}
Jonas
  • 19,422
  • 10
  • 54
  • 67
  • What can we do when the criteria is dynamically provided? For instance, what if at runtime the we are provided with a map, i.e. `{ :date 1 :name 2 }` -- which indicates that we want to group based on date and name which are the 2nd and 3rd columns in each nested collection? I've tried to recursively build the nested collection, but I haven't quite figured it out yet. – Ari Oct 07 '11 at 01:17
  • 1
    Change `first` to `#(nth % (:name amap))` and similarly `second` to `#(nth % (:date amap))` – Jonas Oct 07 '11 at 03:49
3

Generalization of group-by

I needed a generalization of group-by that’d produce more than 2-nested map-of-maps. I wanted to be able to give such a function a list of arbitrary functions to run recursively through group-by. Here’s what I came up with:

(defn map-function-on-map-vals
  "Take a map and apply a function on its values. From [1].
   [1] http://stackoverflow.com/a/1677069/500207"
  [m f]
  (zipmap (keys m) (map f (vals m))))

(defn nested-group-by
  "Like group-by but instead of a single function, this is given a list or vec
   of functions to apply recursively via group-by. An optional `final` argument
   (defaults to identity) may be given to run on the vector result of the final
   group-by."
  [fs coll & [final-fn]]
  (if (empty? fs)
    ((or final-fn identity) coll)
    (map-function-on-map-vals (group-by (first fs) coll)
                              #(nested-group-by (rest fs) % final-fn))))

Your example

Applied to your dataset:

cljs.user=> (def foo [ ["A" 2011 "Dan"]
       #_=>            ["A" 2011 "Jon"]
       #_=>            ["A" 2010 "Tim"]
       #_=>            ["B" 2009 "Tom"] ])
cljs.user=> (require '[cljs.pprint :refer [pprint]])
nil
cljs.user=> (pprint (nested-group-by [first second] foo))
{"A"
 {2011 [["A" 2011 "Dan"] ["A" 2011 "Jon"]], 2010 [["A" 2010 "Tim"]]},
 "B" {2009 [["B" 2009 "Tom"]]}}

Produces exactly the desired output. nested-group-by could take three or four or more functions and produces that many nestings of hash-maps. Perhaps this will be helpful to others.

Handy feature

nested-group-by also has a handy extra feature: final-fn, which defaults to identity so if you don’t provide one, the deepest nesting returns a vector of values, but if you do provide a final-fn, that is run on the innermost vectors. To illustrate: if you just wanted to know how many rows of the original dataset appeared in each category and year:

cljs.user=> (nested-group-by [first second] foo count)
                                               #^^^^^ this is final-fn
{"A" {2011 2, 2010 1}, "B" {2009 1}}

Caveat

This function doesn’t use recur so deeply-recursive calls could blow the stack. However, for the anticipated use-case, with only a small handful of functions, this shouldn’t be a problem.

Ahmed Fasih
  • 6,458
  • 7
  • 54
  • 95
1

I suspect the most idiomatic version of this is:

(defn nest-by
  [ks coll]
  (let [keyfn (apply juxt ks)]
    (reduce (fn [m x] (update-in m (keyfn x) (fnil conj []) x)) {} coll)))

This takes advantage of the fact that update-in already does most of what you want. In your particular case you'd then just go:

(nest-by [first second] [["A" 2011 "Dan"]
                         ["A" 2011 "Jon"]
                         ["A" 2010 "Tim"]
                         ["B" 2009 "Tom"] ])

{"A" {2011 [["A" 2011 "Dan"] ["A" 2011 "Jon"]], 2010 [["A" 2010 "Tim"]]}, "B" {2009 [["B" 2009 "Tom"]]}}
Thom
  • 2,643
  • 2
  • 30
  • 33
1

Here's the solution I came up with. It works, but I'm sure it can be improved.

(defn nest [data criteria]
  (if (empty? criteria)
    data
    (into {} (for [[k v] (group-by #(nth % (-> criteria vals first)) data)]
      (hash-map k (nest v (rest criteria)))))))
Ari
  • 4,121
  • 8
  • 40
  • 56
  • I think it is simpler and more correct to use a vector of indices as the criteria instead of a map, seeing we never use the keys, and the vals are not guaranteed to be in order :) – Timothy Pratley Sep 23 '12 at 18:57
0

This gets you pretty close

(defn my-group [coll]                                                                                                                                                                                                                       
  (let [m (group-by                                                                                                                                                                                                                         
           #(-> % val first first)                                                                                                                                                                                                          
           (group-by #(second %) coll))]                                                                                                                                                                                                    
    (into {} (for [[k v] m] [k (#(into {} %) v)]))))                                                                                                                                                                                        

(my-group [["A" 2011 "Dan"] ["A" 2011 "Jon"] ["A" 2010 "Tim"] ["B" 2009 "Tom"]])                                                                                                                                                            

{"A" {                                                                                                                                                                                                                                      
      2011 [["A" 2011 "Dan"] ["A" 2011 "Jon"]],                                                                                                                                                                                             
      2010 [["A" 2010 "Tim"]]                                                                                                                                                                                                               
      },                                                                                                                                                                                                                                    
 "B" {2009 [["B" 2009 "Tom"]]}                                                                                                                                                                                                              
}

As usual with Clojure, you can probably find something that is a little less verbose.

Julien Chastang
  • 17,592
  • 12
  • 63
  • 89