Clojure: Aggregate and Count in Maps

Question

I guess this question qualifies as an entry-level clojure problem. I basically have troubles processing a clojure map multiple times and extract different kinds of data.

Given a map like this, I'm trying to count entries based on multiple nested keys:

[
  {
    "a": "X",
    "b": "M",
    "c": 188
  },
  {
    "a": "Y",
    "b": "M",
    "c": 165
  },
  {
    "a": "Y",
    "b": "M",
    "c": 313
  },
  {
    "a": "Y",
    "b": "P",
    "c": 188
  }
]

First, I want to group the entries by the a-key values:

{
  "X" : [
    {
      "b": "M",
      "c": 188
    }
  ],
  "Y" : [
    {
      "b": "M",
      "c": 165
    },
    {
      "b": "M",
      "c": 313
    },
    {
      "b": "P",
      "c": 188
    }
  ]
}

Second, I want to assume values of b-keys as duplicates and ignore the remaining keys:

{
  "X" : [
    {
      "b": "M"
    }
  ],
  "Y" : [
    {
      "b": "M"
    }
    {
      "b": "P"
    }
  ]
}

Then, simply count all instances of the b-key:

{
  "X" : 1,
  "Y" : 2
}

As I'm getting the data through monger, I defined:

(defn db-query
  ([coll-name]
     (with-open [conn (mg/connect)]
       (doall (mc/find-maps (mg/get-db conn db-name) coll-name))))

and then hitting the roadblock:

(defn get-sums [request]
  (->> (db-query "data")
       (group-by :a)
       (into {})
        keys))

How could I continue from here?

birdspider · Accepted Answer · 2016-03-21T20:32:35.123

This is a naive approach, I am sure there are better ways but it might be what you need to figure it out.

(into {}
  (map       

    ; f       
    (fn [ [k vs] ] ;[k `unique count`]
      [k (count (into #{} (map #(get % "b") vs)))]) 

    ; coll
    (group-by #(get % "a") DATA))) ; "a"s as keys
;user=> {"X" 1, "Y" 2}

Explanation:

; I am using your literal data as DATA, just removed the , and ;
(def DATA [{...

(group-by #(get % "a") DATA) ; groups by "a" as keys
; so I get a map {"X":[{},...] "Y":[{},{},{},...]}

; then I map over each [k v] pair where
; k is the map key and
; vs are the grouped maps in a vector
(fn [ [k vs] ] 
      ; here `k` is e.g. "Y" and `vs` are the maps {a _, b, _, c _}

      ; now `(map #(get % "b") vs)` gets me all the b values
      ; `into set` makes them uniqe
      ; `count` counts them
      ; finally I return a vector with the same name `k`,
      ;   but the value is the counted `b`s
      [k (count (into #{} (map #(get % "b") vs)))]) 

; at the end I just put the result `[ ["Y" 2] ["X" 1] ]` `into` a map {}
; so you get a map

This works. How awesome. Could you explain how the `fn` for the `b`-part aggregates all the `b`-keys? It's strange because the first step is using group-by, and then it's not used in the second step, but it somehow must be grouped as well. — frhd, Mar 21 '16 at 20:15

BWStearns · Answer 2 · 2016-03-21T22:54:12.143

(def data [{"a" "X", "b" "M", "c" 188}
       {"a" "Y", "b" "M", "c" 165}
       {"a" "Y", "b" "M", "c" 313}
       {"a" "Y", "b" "P", "c" 188}])
;; Borrowing data from @leetwinski

One thing you might want to consider if you're defining the data is to use keywords instead of strings as the keys. This comes with the benefit of being able to use keywords as functions to access things in the map, i.e. (get my-map "a") becomes (:a my-map).

To get the data grouped by "a" key:

(defn by-a-key [data] 
  (group-by #(get % "a") data))

I think you can actually skip your second step if it's just being used to get you to your third step as it is not needed in order to do so. On second reading I can't tell if you want to only keep one element per distinct "b" key. I'm going to assume not since you didn't specify how to pick which to retain and they appear to be substantially different.

(reduce-kv 
  (fn [m k v] 
    (assoc m k 
      (count (filter #(contains? % "b") v)))) 
  {} 
  (by-a-key data))

You could also do the whole thing like so:

(frequencies (map #(get % "a") (filter #(contains? % "b") data)))

Since you can filter by contains "b" key before grouping you can rely on the frequencies to group and count for you.

Thanks for explaining the keyword access. You're right, I actually wanna use keywords, not strings. — frhd, Mar 22 '16 at 10:55
Btw, Did you intend to drop one of the Y records with the "b" value of "M"? It was unclear from the slight difference between the text and the snippets. — BWStearns, Mar 22 '16 at 20:14
Yes, only unique values are to be counted. I think your solution is correct in this regard. — frhd, Mar 23 '16 at 08:47

score 1 · Answer 3 · answered Mar 21 '16 at 20:20

you can make it using reduce:

(def data [{"a" "X", "b" "M", "c" 188}
           {"a" "Y", "b" "M", "c" 165}
           {"a" "Y", "b" "M", "c" 313}
           {"a" "Y", "b" "P", "c" 188}])

(def processed (reduce #(update % (%2 "a") (fnil conj #{}) (%2 "b")) 
                       {} data))

;; {"X" #{"M"}, "Y" #{"M" "P"}}
;; you create a map of "a" values to a sets of "b" values in one pass
;; and then you just create a new map with counts

(reduce-kv #(assoc %1 %2 (count %3)) {} processed)

;; {"X" 1, "Y" 2}

so it uses the same logic as @birdspider's solution, but uses less passes over a collections

in one function:

(defn process [data]
  (->> data
       (reduce #(update % (%2 "a") (fnil conj #{}) (%2 "b")) {})
       (reduce-kv #(assoc %1 %2 (count %3)) {})))

 user> (process data)
 ;; {"X" 1, "Y" 2}

Clojure: Aggregate and Count in Maps

3 Answers3

Linked