I have a binary file that contains an X by X matrix. The file itself is a sequence of single-precision floats (little-endian). What I would like to do is parse it, and stuff it into some reasonable clojure matrix data type.
Thanks to this question, I see I can parse a binary file with gloss. I now have code that looks like this:
(ns foo.core
(:require gloss.core)
(:require gloss.io)
(:use [clojure.java.io])
(:use [clojure.math.numeric-tower]))
(gloss.core/defcodec mycodec
(gloss.core/repeated :float32 :prefix :none))
(def buffer (byte-array (* 1200 1200)))
(.read (input-stream "/path/to/binaryfile") buffer)
(gloss.io/decode mycodec buffer)
This takes a while to run, but eventually dumps out a big list of numbers. Unfortunately, the numbers are all wrong. Upon further investigation, the numbers were read as big-endian.
Assuming there is some way to read these binary files as little-endian, I'd like to stuff the results into a matrix. This question seems to have settled on using Incanter with its Parallel Colt representation, however, that question was from '09, and I'm hoping to stick to clojure 1.4 and lein 2. Somewhere in my frenzy of googling, I saw other recommendations to use jblas or mahout. Is there a "best" matrix library for clojure these days?
EDIT: Reading a binary file is tantalizingly close. Thanks to this handy nio wrapper, I am able to get a memory mapped byte buffer as a short one-liner, and even reorder it:
(ns foo.core
(:require [clojure.java.io :as io])
(:require [nio.core :as nio])
(:import [java.nio ByteOrder]))
(def buffer (nio/mmap "/path/to/binaryfile"))
(class buffer) ;; java.nio.DirectByteBuffer
(.order buffer java.nio.ByteOrder/LITTLE_ENDIAN)
;; #<DirectByteBuffer java.nio.DirectByteBuffer[pos=0 lim=5760000 cap=5760000]>
However, reordering without doing the intermediate (def) step, fails:
(.order (nio/mmap f) java.nio.ByteOrder/LITTLE_ENDIAN)
;; clojure.lang.Compiler$CompilerException: java.lang.IllegalArgumentException: Unable to resolve classname: MappedByteBuffer, compiling:(/Users/peter/Developer/foo/src/foo/core.clj:12)
;; at clojure.lang.Compiler.analyzeSeq (Compiler.java:6462)
;; clojure.lang.Compiler.analyze (Compiler.java:6262)
;; etc...
I'd like to be able to create the reordered byte buffer this inside a function without defining a global variable, but right now it seems to not like that.
Also, once I've got it reordered, I'm not entirely sure what to do with my DirectByteBuffer, as it doesn't seem to be iterable. Perhaps for the remaining step of reading this buffer object (into a JBLAS matrix), I will create a second question.
EDIT 2: I am marking the answer below as accepted, because I think my original question combined too many things. Once I figure out the remainder of this I will try to update this question with complete code that starts with this ByteBuffer and that reads into a JBLAS matrix (which appears to be the right data structure).
In case anyone was interested, I was able to create a function that returns a properly ordered bytebuffer as follows:
;; This works!
(defn readf [^String file]
(.order
(.map
(.getChannel
(java.io.RandomAccessFile. file "r"))
java.nio.channels.FileChannel$MapMode/READ_ONLY 0 (* 1200 1200))
java.nio.ByteOrder/LITTLE_ENDIAN))
The nio wrapper I found looks to simplify / prettify this quite a lot, but it would appear I'm either not using it correctly, or there is something wrong. To recap my findings with the nio wrapper:
;; this works
(def buffer (nio/mmap "/bin/file"))
(def buffer (.order buffer java.nio.ByteOrder/LITTLE_ENDIAN))
(def buffer (.asFloatBuffer buffer))
;; this fails
(def buffer
(.asFloatBuffer
(.order
(nio/mmap "/bin/file")
java.nio.ByteOrder/LITTLE_ENDIAN)))
Sadly, this is a clojure mystery for another day, or perhaps another StackOverflow question.