3

I have a float data which I read from a file. But I know it is in Little-endian form. In order to make sense of it I need to convert it to Big-endian.

How do I do this?

Here is my current code:

  private def bilToArray(dataFile: Path, nRows: Integer, nCols: Integer): Array[Array[Float]] = {
    val matrix = Array.ofDim[Float](nRows,nCols)
    var file = new File(dataFile.toString)
    var inputStream = new DataInputStream(new FileInputStream(file))

    for(i <- 0 to nRows){
      for(j <- 0 to nCols){
        matrix(i)(j) = inputStream.readFloat() // need to convert to big endian
      }
    }

    return matrix
  }
Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Qubit
  • 61
  • 6
  • If it's actually a Java data stream, you don't know what endian it is or have to worry about it. If it's some binary data written by something other that DataOutputStream, you don't use DataInputStream, you just read bytes, and use methods on Double. – bmargulies Oct 22 '17 at 04:16
  • `index <- 0 to n...` will result in `ArrayIndexOutOfBoundsException`. – jwvh Oct 22 '17 at 04:38
  • 1
    Actually, this post make sense because there is no full answer for this exact question. Anyway, if it helpful for you @Qubit, here is my solution - https://codepad.remoteinterview.io/SPCHQRXAPG. You can use method toBigEndians. Once you get Array[Float], you can access it as matrix, not vector. – Artavazd Balayan Oct 22 '17 at 06:27
  • 1
    In Java: `Float.intBitsToFloat(Integer.reverseBytes(inputStream.readInt()))`. If you have a bunch of bytes to convert, @ArtavazdBalayan’s solution using a `ByteBuffer` to do a bulk data conversion might be more efficient, especially, when reading the bytes into the buffer in a single operation instead of `int` by `int`… – Holger Oct 23 '17 at 11:36

2 Answers2

1

You can use ByteBuffer and ByteOrder to deal with endians. Has been improved after Holger's answer:

import java.nio._
import java.io._
import scala.annotation.tailrec
import scala.util.{Try, Success, Failure}

object FloatBigEndialLittleEndian {
    val floatSize: Int = java.lang.Float.SIZE / 8

    def main(args: Array[String]): Unit = {
        println(s"Float size: $floatSize")
        // floats in little endian (1, 2, 3, 4, 5)
        val littleEndians = Array[Int](0x0000803f, 0x00000040, 
                                       0x00004040, 0x00008040, 
                                       0x0000a040)

        val bs = new ByteArrayInputStream(getBytes(littleEndians))
        val ds = new DataInputStream(bs)

        val floats = toBigEndians(ds)
        println(floats)

        ds.close()
        bs.close()

    }
    // it just helper method to get Array[Byte] and create DataInputStream
    def getBytes(rawData: Array[Int]): Array[Byte] = {
        val b = ByteBuffer.allocate(rawData.length * floatSize)
        b.order(ByteOrder.BIG_ENDIAN)
        rawData.foreach(f => b.putInt(f))
        b.rewind()
        return b.array()
    }

    def toBigEndians(stream: DataInputStream): Seq[Float] = {
        val bf = streamToByteBuffer(stream)
        // rewind the buffer to make it ready for reading
        bf.rewind()

        // when we read, we want to get it in BIG_ENDIAN
        val floatBuffer = bf.order(ByteOrder.LITTLE_ENDIAN).asFloatBuffer();
        val n = floatBuffer.remaining

        @tailrec
        def floatBufferToArray_(idx: Int, floats: Array[Float]):  Array[Float] = {
            if (floatBuffer.hasRemaining) {
                // floatBuffer.get returns current an increments position
                floats(idx) = floatBuffer.get
                floatBufferToArray_(idx + 1, floats)
            }
            else floats
        }
        // allocate result float array
        val floatArray = Array.ofDim[Float](n)
        floatBufferToArray_(0, floatArray)
    }
    def streamToByteBuffer(stream: DataInputStream): ByteBuffer = {
        @tailrec
        def streamToByteBuffer_(stream: DataInputStream, 
                               bf: ByteBuffer): ByteBuffer = {
            Try(bf.put(stream.readByte())) match {
                case Success(_) => streamToByteBuffer_(stream, bf)
                case Failure(ex) if ex.isInstanceOf[EOFException] => bf
                case Failure(ex) => throw ex
            }
        }
        // pre-allocate with the size of the stream
        val bf = ByteBuffer.allocateDirect(stream.available)
        streamToByteBuffer_(stream, bf)
    }
}
Artavazd Balayan
  • 2,353
  • 1
  • 16
  • 25
  • This is pointing into the right direction, so you get my +1. It may benefit from using the `ByteBuffer` for the data transfer in the first place. I added an answer showing how the Java code doing bulk transfers may look like. You may translate it to Scala, if you wish, as I’m not a Scala expert… – Holger Oct 23 '17 at 12:08
1

Artavazd Balayan’s answer is pointing into the right direction. A ByteBuffer allows to interpret its contents as Big Endian or Little Endian, as you wish and is the right tool for the job.

But when you use it, there is no need for DataInputStream anymore. You can use the ByteBuffer for the transfer directly.

A Java code handling it efficiently may look like:

static float[][] bilToArray(Path dataFile, int nRows, int nCols) throws IOException {
    try(ReadableByteChannel fch = Files.newByteChannel(dataFile, StandardOpenOption.READ)){
        float[][] matrix = new float[nRows][nCols];
        ByteBuffer bb = ByteBuffer.allocateDirect(Float.BYTES*nRows*nCols);
        while(bb.hasRemaining()) if(fch.read(bb)<0) throw new EOFException();
        bb.flip();
        FloatBuffer fb = bb.order(ByteOrder.LITTLE_ENDIAN).asFloatBuffer();
        for(float[] row: matrix) fb.get(row);
        return matrix;
    }
}

If the file is very large and you know that it is always in the default file system, you may even use a Memory-mapped file:

static float[][] bilToArray(Path dataFile, int nRows, int nCols) throws IOException {
    try(FileChannel fch = FileChannel.open(dataFile, StandardOpenOption.READ)) {
        float[][] matrix = new float[nRows][nCols];
        ByteBuffer bb = fch.map(FileChannel.MapMode.READ_ONLY, 0, Float.BYTES*nRows*nCols);
        FloatBuffer fb = bb.order(ByteOrder.LITTLE_ENDIAN).asFloatBuffer();
        for(float[] row: matrix) fb.get(row);
        return matrix;
    }
}

Converting this to Scala shouldn’t be so hard.

Holger
  • 285,553
  • 42
  • 434
  • 765