0

I have a binary file containing 8 byte floats (doubles). When I read it in python using this code:

import array
d = array.array('d')
d.fromfile(open("foo", mode="rb"), 10)
print d

I get different results than this java code run on the same file:

DataInputStream is;

try {
    is = new DataInputStream(new FileInputStream(FILE_NAME));

    int n = 0;

    while(n < 10) {
        System.out.println(is.readDouble());
        n++;
    }

    is.close();

} catch (FileNotFoundException e) {
    e.printStackTrace();
} catch (IOException e) {
    e.printStackTrace();
}

What am I doing wrong?

Here is the example outputs:

Java:

-6.519670308091451E91
-6.723367689137016E91
0.0
-6.519664091503568E91
1.2029778888642203E-19
1.2029778888642203E-19
1.2028455399662426E-19
-1.1421078747242632E217
2.2734939098318505E236
-3.494281168153119E125

Python:

array('d', [-1.504936576164858e-36, -5.878658489696332e-39, 0.0, -5.878658478748688e-39, 5.878658495170291e-39, 5.878658495170291e-39, -5.878655692573363e-39, -5.87865851296011e-39, 4.79728723e-315, 1.546036714e-314])

Here is the C program I'm using to generate the data:

#include <stdio.h>

double test_data[10] = {
    -1.504936576164858e-36, 
    -5.878658489696332e-39, 
    0.0, 
    -5.878658478748688e-39, 
    5.878658495170291e-39, 
    .878658495170291e-39, 
    -5.878655692573363e-39, 
    -5.87865851296011e-39, 
    4.79728723e-315, 
    1.546036714e-314
};

int main() {
    FILE * fp;
    fp = fopen("foo", "wb");
    if(fp != NULL) {
        fwrite(test_data, sizeof(double), 10, fp);
        fclose(fp);
    }
    return 0;
}
tylerjw
  • 802
  • 4
  • 14
  • 28
  • In what way different? – Ray Aug 05 '14 at 15:40
  • A 'file' is just a bunch of bytes. When trying to make sense of those bytes one *has to know* what is represented *how*. The first step would be to look at the file with a hex-editor and figure out what is really in there and *how it is represented* (e.g. endianess, auxilarry information etc.). There is no *C-style* array, since C depends on the platforms endianess when it comes to physical representation. – Durandal Aug 05 '14 at 15:52

2 Answers2

3

Java's DataInputStream always treats data as big-endian. You can see this using Python's struct module and your example data:

>>> s = struct.pack("<d", -1.504936576164858e-36)
>>> s
'\xd3\x00\x00\xb9\xd3\x00\x80\xb8'
>>> struct.unpack("<d", s)
(-1.504936576164858e-36,)
>>> struct.unpack(">d", s)
(-6.519670308091451e+91,)

The question you need to answer is whether the data you're looking at is stored as little-endian or big-endian and interpret it correctly.

bgporter
  • 35,114
  • 8
  • 59
  • 65
1

Java is reading the doubles as big endian, while your data may be little endian. To read a little endian double in Java, you can use:

double d = Double.longBitsToDouble(Long.reverseBytes(is.readLong()));

This will read the eight bytes from the file as a long, swap the order of the individual bytes from one endianness to the other, and then convert the value to a double.

David Conrad
  • 15,432
  • 2
  • 42
  • 54