2

how to read a binary file in chunks in scala.

This was what I was trying to do

val fileInput = new FileInputStream("tokens")
    val dis = new DataInputStream(fileInput)
    var value = dis.readInt()
    var i=0;
println(value)

the value which is printed is a huge number. Whereas it should return 1 as the first output

Gaurav
  • 617
  • 2
  • 9
  • 23

1 Answers1

10

Because you're seeing 16777216 where you'd expect to have a 1, it sounds like the problem is the endianness of the file is different than the JVM is expecting. (That is, Java always expects big endian/network byte order and your file contains numbers in little endian.)

That's a problem with a well established gamut of solutions.

  • For example this page has a class that wraps the input stream and makes the problem go away.

  • Alternatively this page has functions that will read from a DataInputStream.

  • This StackOverflow answer has various snippets that will simply convert an int, if that's all you need to do.

  • Here's a Scala snippet that will add methods to read little endian numbers from the file.

The simplest answer to your question of how to fix it is to simply swap the bytes around as you read them. You could do that by replacing your line that looks like

var value = dis.readInt()

with

var value = java.lang.Integer.reverseBytes(dis.readInt())

If you wanted to make that a bit more concise, you could use either the approach of implicitly adding readXLE() methods to DataInput or you could override DataInputStream to have readXLE() methods. Unfortunately, the Java authors decided that the readX() methods should be final, so we can't override those to provide a transparent reader for little endian files.

object LittleEndianImplicits {
  implicit def dataInputToLittleEndianWrapper(d: DataInput) = new DataInputLittleEndianWrapper(d)

  class DataInputLittleEndianWrapper(d: DataInput) {
    def readLongLE(): Long = java.lang.Long.reverseBytes(d.readLong())
    def readIntLE(): Int = java.lang.Integer.reverseBytes(d.readInt())
    def readCharLE(): Char = java.lang.Character.reverseBytes(d.readChar())
    def readShortLE(): Short = java.lang.Short.reverseBytes(d.readShort())
  }
}

class LittleEndianDataInputStream(i: InputStream) extends DataInputStream(i) {
  def readLongLE(): Long = java.lang.Long.reverseBytes(super.readLong())
  def readIntLE(): Int = java.lang.Integer.reverseBytes(super.readInt())
  def readCharLE(): Char = java.lang.Character.reverseBytes(super.readChar())
  def readShortLE(): Short = java.lang.Short.reverseBytes(super.readShort())
}

object M {
  def main(a: Array[String]) {
    println("// Regular DIS")
    val d = new DataInputStream(new java.io.FileInputStream("endian.bin"))
    println("Int 1: " + d.readInt())
    println("Int 2: " + d.readInt())

    println("// Little Endian DIS")
    val e = new LittleEndianDataInputStream(new java.io.FileInputStream("endian.bin"))
    println("Int 1: " + e.readIntLE())
    println("Int 2: " + e.readIntLE())

    import LittleEndianImplicits._
    println("// Regular DIS with readIntLE implicit")
    val f = new DataInputStream(new java.io.FileInputStream("endian.bin"))
    println("Int 1: " + f.readIntLE())
    println("Int 2: " + f.readIntLE())
  }
}

The "endian.bin" file mentioned above contains a big endian 1 followed bay a little endian 1. Running the above M.main() prints:

// Regular DIS
Int 1: 1
Int 2: 16777216
// LE DIS
Int 1: 16777216
Int 2: 1
// Regular DIS with readIntLE implicit
Int 1: 16777216
Int 2: 1
Community
  • 1
  • 1
Leif Wickland
  • 3,693
  • 26
  • 43
  • @Gaurav, it may be obvious, but you might consider extending DataInputStream. – Ed Staub Feb 18 '12 at 02:23
  • @EdStaub, would you believe that Java makes it rather inconvenient to extend DataInputStream because that class marks readInt(), etc. as final? – Leif Wickland Feb 19 '12 at 15:35
  • Oops - should have checked for that, I've been burned enough with finality. Delegate, then, if it's worth it. In the example, where few DataInputStream methods are used, it probably is. – Ed Staub Feb 19 '12 at 16:21