3

I have a binary file that contains an X by X matrix. The file itself is a sequence of single-precision floats (little-endian). What I would like to do is parse it, and stuff it into some reasonable clojure matrix data type.

Thanks to this question, I see I can parse a binary file with gloss. I now have code that looks like this:

(ns foo.core
  (:require gloss.core)
  (:require gloss.io)
  (:use [clojure.java.io])
  (:use [clojure.math.numeric-tower]))

(gloss.core/defcodec mycodec
  (gloss.core/repeated :float32 :prefix :none))

(def buffer (byte-array (* 1200 1200)))

(.read (input-stream "/path/to/binaryfile") buffer)

(gloss.io/decode mycodec buffer)

This takes a while to run, but eventually dumps out a big list of numbers. Unfortunately, the numbers are all wrong. Upon further investigation, the numbers were read as big-endian.

Assuming there is some way to read these binary files as little-endian, I'd like to stuff the results into a matrix. This question seems to have settled on using Incanter with its Parallel Colt representation, however, that question was from '09, and I'm hoping to stick to clojure 1.4 and lein 2. Somewhere in my frenzy of googling, I saw other recommendations to use jblas or mahout. Is there a "best" matrix library for clojure these days?

EDIT: Reading a binary file is tantalizingly close. Thanks to this handy nio wrapper, I am able to get a memory mapped byte buffer as a short one-liner, and even reorder it:

(ns foo.core
  (:require [clojure.java.io :as io])
  (:require [nio.core :as nio])
  (:import [java.nio ByteOrder]))

(def buffer (nio/mmap "/path/to/binaryfile"))

(class buffer) ;; java.nio.DirectByteBuffer

(.order buffer java.nio.ByteOrder/LITTLE_ENDIAN)
;; #<DirectByteBuffer java.nio.DirectByteBuffer[pos=0 lim=5760000 cap=5760000]>

However, reordering without doing the intermediate (def) step, fails:

(.order (nio/mmap f) java.nio.ByteOrder/LITTLE_ENDIAN)

;; clojure.lang.Compiler$CompilerException: java.lang.IllegalArgumentException: Unable to resolve classname: MappedByteBuffer, compiling:(/Users/peter/Developer/foo/src/foo/core.clj:12)
;;  at clojure.lang.Compiler.analyzeSeq (Compiler.java:6462)
;;     clojure.lang.Compiler.analyze (Compiler.java:6262)
;; etc...

I'd like to be able to create the reordered byte buffer this inside a function without defining a global variable, but right now it seems to not like that.

Also, once I've got it reordered, I'm not entirely sure what to do with my DirectByteBuffer, as it doesn't seem to be iterable. Perhaps for the remaining step of reading this buffer object (into a JBLAS matrix), I will create a second question.

EDIT 2: I am marking the answer below as accepted, because I think my original question combined too many things. Once I figure out the remainder of this I will try to update this question with complete code that starts with this ByteBuffer and that reads into a JBLAS matrix (which appears to be the right data structure).

In case anyone was interested, I was able to create a function that returns a properly ordered bytebuffer as follows:

;; This works!
(defn readf [^String file]
  (.order
   (.map
    (.getChannel
     (java.io.RandomAccessFile. file "r"))
    java.nio.channels.FileChannel$MapMode/READ_ONLY 0 (* 1200 1200))
   java.nio.ByteOrder/LITTLE_ENDIAN))

The nio wrapper I found looks to simplify / prettify this quite a lot, but it would appear I'm either not using it correctly, or there is something wrong. To recap my findings with the nio wrapper:

;; this works
(def buffer (nio/mmap "/bin/file"))
(def buffer (.order buffer java.nio.ByteOrder/LITTLE_ENDIAN))
(def buffer (.asFloatBuffer buffer))

;; this fails
(def buffer
  (.asFloatBuffer
   (.order
    (nio/mmap "/bin/file")
    java.nio.ByteOrder/LITTLE_ENDIAN)))

Sadly, this is a clojure mystery for another day, or perhaps another StackOverflow question.

Community
  • 1
  • 1
Peter
  • 4,219
  • 4
  • 28
  • 40

1 Answers1

2

Open a FileChannel(), then get a memory mapped buffer. There are lots of tutorials on the web for this step.

Switch the order of the buffer to little endian by calling order(endian-ness) (not the no-arg version of order). Finally, the easiest way to extract floats would be to call asFloatBuffer() on it and use the resulting buffer to read the floats.

After that you can put the data into whatever structure you need.

edit Here's an example of how to use the API.

;; first, I created a 96 byte file, then I started the repl
;; put some little endian floats in the file and close it
user=> (def file (java.io.RandomAccessFile. "foo.floats", "rw"))
#'user/file
user=> (def channel (.getChannel file))
#'user/channel
user=> (def buffer (.map channel java.nio.channels.FileChannel$MapMode/READ_WRITE 0 96))
#'user/buffer
user=> (.order buffer java.nio.ByteOrder/LITTLE_ENDIAN)
#<DirectByteBuffer java.nio.DirectByteBuffer[pos=0 lim=96 cap=96]>
user=> (def fbuffer (.asFloatBuffer buffer))
#'user/fbuffer
user=> (.put fbuffer 0 0.0)
#<DirectFloatBufferU java.nio.DirectFloatBufferU[pos=0 lim=24 cap=24]>
user=> (.put fbuffer 1 1.0)
#<DirectFloatBufferU java.nio.DirectFloatBufferU[pos=0 lim=24 cap=24]>
user=> (.put fbuffer 2 2.3)
#<DirectFloatBufferU java.nio.DirectFloatBufferU[pos=0 lim=24 cap=24]>
user=> (.close channel)
nil

;; memory map the file, try reading the floats w/o changing the endianness of the buffer
user=> (def file2 (java.io.RandomAccessFile. "foo.floats" "r"))
#'user/file2
user=> (def channel2 (.getChannel file2))                                                
#'user/channel2
user=> (def buffer2 (.map channel2 java.nio.channels.FileChannel$MapMode/READ_ONLY 0 96))
#'user/buffer2
user=> (def fbuffer2 (.asFloatBuffer buffer2))
#'user/fbuffer2
user=> (.get fbuffer2 0)
0.0
user=> (.get fbuffer2 1)
4.6006E-41
user=> (.get fbuffer2 2)
4.1694193E-8

;; change the order of the buffer and read the floats    
user=> (.order buffer2 java.nio.ByteOrder/LITTLE_ENDIAN)                                 
#<DirectByteBufferR java.nio.DirectByteBufferR[pos=0 lim=96 cap=96]>
user=> (def fbuffer2 (.asFloatBuffer buffer2))
#'user/fbuffer2
user=> (.get fbuffer2 0)
0.0
user=> (.get fbuffer2 1)
1.0
user=> (.get fbuffer2 2)
2.3
user=> (.close channel2)
nil
user=> 
BillRobertson42
  • 12,602
  • 4
  • 40
  • 57
  • Am I correct that you're telling me to use nio? After some digging I found this page (https://github.com/pjstadig/nio) which looks like it wraps some of these functions. Will do some more digging to see if I can figure this out. – Peter Oct 27 '12 at 15:06
  • Yes. You got it. I will try to add an example later in the day. – BillRobertson42 Oct 27 '12 at 15:27
  • Well, I've got a DirectByteBuffer back as a result of '(nio/mmap (nio/channel (io/file "/path/to/file")))'. Now to figure out order. Also, sorry to be a bit thick--new to Java and Clojure. – Peter Oct 27 '12 at 15:40
  • Call this function http://docs.oracle.com/javase/6/docs/api/java/nio/ByteBuffer.html#order%28java.nio.ByteOrder%29 and pass in java.nio.ByteOrder/LITTLE_ENDIAN to set it to little endian. – BillRobertson42 Oct 27 '12 at 17:38
  • This is turning out to be a bit of a learning curve for me! (def buffer (nio/mmap "/my/file")) (.order buffer java.nio.ByteOrder/LITTLE_ENDIAN) works like a charm. But for some reason, I get a stack trace when I try to set the order inside of a let statement. (defn read-m [filepath] (let [buffer (nio/mmap filepath)] (.order buffer java.nio.ByteOrder/LITTLE_ENDIAN))) barfs on compilation with an "Unable to resolve classname: MappedByteBuffer..." Is there any good reason this shouldn't work? – Peter Oct 29 '12 at 15:04
  • Can you add your code and full error to the question. Hard to read it from a comment like that. – BillRobertson42 Oct 29 '12 at 16:11
  • Okay, I've gone ahead and updated the question with code that reflects this approach. Many thanks! – Peter Oct 29 '12 at 19:44
  • I've gone ahead and marked this as accepted. I think I was asking too many things in my question. Thank you so much for your patience and willingness to provide examples! – Peter Oct 30 '12 at 14:12
  • 1
    @Peter That an nio bug. See: https://github.com/pjstadig/nio/pull/5 It was fixed in version 1.0.2 – Cesar Canassa Dec 04 '13 at 18:07