5

What's the best way to get a sequence of columns (as vectors or whatever) from an Incanter data set?

I thought of:

(to-vect (trans (to-matrix my-dataset)))

But Ideally, I'd like a lazy sequence. Is there a better way?

Rob Lachlan
  • 14,289
  • 5
  • 49
  • 99

3 Answers3

5

Use the $ macro.

=> (def data (to-dataset [{:a 1 :b 2} {:a 3 :b 4}]))
=> ($ :a data)  ;; :a column
=> ($ 0 :all data) ;; first row

=> (type ($ :a data))
clojure.lang.LazySeq
Paul Lam
  • 1,729
  • 1
  • 15
  • 25
  • True, type of ($ :a data) is LazySeq, but when asking for more than one column, e. g. ($ :all data) it is Dataset. Is there any way of getting a sequence of columns when more than one is asked for? – 0dB Apr 23 '13 at 09:56
2

Looking at the source code for to-vect it makes use of map to build up the result, which is already providing one degree of lazyness. Unfortunately, it looks like the whole data set is first converted toArray, probably just giving away all the benefits of map lazyness.

If you want more, you probably have to dive into the gory details of the Java object effectively holding the matrix version of the data set and write your own version of to-vect.

skuro
  • 13,414
  • 1
  • 48
  • 67
1

You could use the internal structure of the dataset.

user=> (use 'incanter.core)
nil
user=> (def d (to-dataset [{:a 1 :b 2} {:a 3 :b 4}]))
#'user/d
user=> (:column-names d)
[:a :b]
user=> (:rows d)
[{:a 1, :b 2} {:a 3, :b 4}]
user=> (defn columns-of
         [dataset]
         (for [column (:column-names dataset)]
           (map #(get % column) (:rows dataset))))
#'user/columns-of
user=> (columns-of d)
((1 3) (2 4))

Although I'm not sure in how far the internal structure is public API. You should probably check that with the incanter guys.

kotarak
  • 17,099
  • 2
  • 49
  • 39