4

Native Code :

writing number 27 using fwrite().

int main()
{
  int a = 27;
  FILE *fp;
  fp = fopen("/data/tmp.log", "w");
  if (!fp)
     return -errno;

  fwrite(&a, 4, 1, fp);
  fclose();
  return 0;
}

Reading back the data(27) using DataInputStream.readInt() :

public int readIntDataInputStream(void)
{
   String filePath = "/data/tmp.log";
   InputStream is = null;
   DataInputStream dis = null;
   int k;

   is = new FileInputStream(filePath);
   dis = new DataInputStream(is);
   k = dis.readInt();
   Log.i(TAG, "Size : " + k);
   return 0;
}

O/p

Size : 452984832

Well that in hex is 0x1b000000

0x1b is 27. But the readInt() is reading the data as big endian while my native coding is writing as little endian. . So, instead of 0x0000001b i get 0x1b000000.

Is my understanding correct? Did anyone came across this problem before?

Sandeep
  • 18,356
  • 16
  • 68
  • 108
  • 1
    Yes, you are correct. C will write in endianness of CPU, which for x86 processors is little-endian. [`DataInputStream.readInt()`](https://docs.oracle.com/javase/8/docs/api/java/io/DataInput.html#readInt--) will always read big-endian. Solution: Decide which endianness your file should have, and make sure both act accordingly. – Andreas Dec 09 '16 at 06:21
  • 2
    More to the point, decide that the file should be big-endian, which makes it portable *and* compatible with Java, and adjust the C code accordingly. All you need in this C code is `int a = htonl(27);` – user207421 Dec 09 '16 at 06:23
  • Thanks @Andreas. I have large amounts of data to write. How can i handle this effectively in C? – Sandeep Dec 09 '16 at 06:24
  • @EJP Actually I have very large amount of data that i am writing into a file from native code and reading from Java application.. Is there a recommended way for this? – Sandeep Dec 09 '16 at 06:25
  • I've just given you one. – user207421 Dec 09 '16 at 06:26
  • I disagree somewhat with @EJP. File doesn't have to be big-endian, though big-endian (also known as network byte order) is the most commonly used endianness for data interchange. You just need to decide what it should be, and make sure C writes that, and Java reads that. In Java, the easiest way to control endianness, is to use `ByteBuffer`. In C, you'd build a byte array (`char[]`) and convert `int` values to `char` using bit-shifting. – Andreas Dec 09 '16 at 06:29
  • 1
    see also http://stackoverflow.com/questions/5078100/fast-reading-of-little-endian-integers-from-file – Scary Wombat Dec 09 '16 at 06:30
  • @Andreas Using the standard is always preferable, and the standard is network byte order/ – user207421 Dec 09 '16 at 06:30
  • @EJP I have large bunch of binary data that i am writing into a file. If i do htonl() for 4 bytes individually i guess that will not look good. So, may be I have to change my design of solution.. – Sandeep Dec 09 '16 at 06:32
  • @Andreas how about other api's in DataInputStream()? Like readFully().. If I write binary data and use this API, will that be ok? I am going to try those options. But a word of wisdom will certainly help me. – Sandeep Dec 09 '16 at 06:36
  • @mk.. Why would you use `htonl()`? byte1 = i >> 24; byte2 = i >> 16; byte3 = i >> 8; byte4 = i; – Andreas Dec 09 '16 at 06:36
  • @mk If you're jusy going to read bytes into a `byte[]` and maybe use a `ByteBuffer` to extract `int` values from such byte arrays, don't use `DataInputStream`. Use `InputStream` directly, or rather `BufferedInputStream` for better performance. `DataOutputStream`/`DataInputStream` are designed for sending data between Java programs. Do not use them for interchange with other languages. – Andreas Dec 09 '16 at 06:38
  • ok.. Actually the data in the file is written by native layer and read by java layer.. The data is of the format .... ex SPS_FRAME1b000000.... Tha android application has to read the complete "size" bytes and wait until that much data is available. So, i wanted readFully() API for this purpose as it is convenient.. I see that this API is only available in DataInputStream. Is it available in other interfaces? @Andreas – Sandeep Dec 09 '16 at 06:46
  • Calling [`InputStream.read(byte[] b, int off, int len)`](http://docs.oracle.com/javase/8/docs/api/java/io/InputStream.html#read-byte:A-int-int-) repeatedly until all bytes have been received is not that difficult. It's a fairly simple loop. – Andreas Dec 09 '16 at 06:49
  • @Andreas Agree. I wil try that and will come back if I have some conern. Thank you – Sandeep Dec 09 '16 at 07:03
  • @Andreas Nonsense. `DataInput/OutputStream` are *specifically designed* for data interchange *with other languages* and platforms. That's *why* they use network byte order. – user207421 Nov 20 '17 at 01:57
  • @EJP Don't know why you revived this old thread, but nothing in the javadoc of those objects says anything of the sort. `DataOutputStream` is for writing **primitive Java data types** in a portable way, i.e. so the data can be **read by a `DataInputStream`**. It's not so the data can be read by other languages, and it's "portable" as in on every platform where Java runs. Using network byte order is more standard, sure, but that is not the listed purpose. – Andreas Nov 20 '17 at 10:17

2 Answers2

2

From the Javadoc for readInt():

This method is suitable for reading bytes written by the writeInt method of interface DataOutput

If you want to read something written by a C program you'll have to do the byte swapping yourself, using the facilities in java.nio. I've never done this but I believe you would read the data into a ByteBuffer, set the buffer's order to ByteOrder.LITTLE_ENDIAN and then create an IntBuffer view over the ByteBuffer if you have an array of values, or just use ByteBuffer#getInt() for a single value.

All that aside, I agree with @EJP that the external format for the data should be big-endian for greatest compatibility.

Jim Garrison
  • 85,615
  • 20
  • 155
  • 190
  • `ByteBuffer` has a [`getInt()`](https://docs.oracle.com/javase/8/docs/api/java/nio/ByteBuffer.html#getInt--) to read the next 4 bytes as an `int` in the given endianness. `IntBuffer` view is only useful if all data is `int`, e.g. if it is an `int[]`. – Andreas Dec 09 '16 at 06:34
0

There are multiple issues in your code:

  • You assume that the size of int is 4, it is not necessarily true, and since you want to deal with 32-bit ints, you should use int32_t or uint32_t.

  • You must open the file in binary more to write binary data reliably. The above code would fail on Windows for less trivial output. Use fopen("/data/tmp.log", "wb").

  • You must deal with endianness. You are using the file to exchange data between different platforms that may have different native endianness and/or endian specific APIs. Java seems to use big-endian, aka network byte order, so you should convert the values on the C platform with the hton32() utility function. It is unlikely to have significant impact on performance on the PC side, as this function is usually expanded inline, possibly as a single instruction and most of the time will be spent waiting for I/O anyway.

Here is a modified version of the code:

#include <endian.h>
#include <stdint.h>
#include <stdio.h>

int main(void) {
    uint32_t a = hton32(27);
    FILE *fp = fopen("/data/tmp.log", "wb");
    if (!fp) {
        return errno;
    }
    fwrite(&a, sizeof a, 1, fp);
    fclose();
    return 0;
}
chqrlie
  • 131,814
  • 10
  • 121
  • 189
  • hi chqrlie, Thanks for the answer. Regarding point 1 and point 2, I m kind of aware of these things. Also, dis z only a test code. Point 1 -> I took it as 4 bytes because it is mentioned, readInt() of java will anyways read exactly 4 bytes. 2-> I am working for unix systems. In Unix systems, 'b' in fopen does not has any signifance. From man page "This is strictly for compatibility with C89 and has no effect; the 'b' is ignored on all POSIX conforming systems, including Linux." But, these are good points to make the program look elegent. Thanks. – Sandeep Dec 11 '16 at 05:29
  • @mk..: I understand the posted code is just a quick and dirty test. I always try to provide a detailed answer for not just the OP, but also other readers to see all the potential issues. `"wb"` is strictly equivalent to `"w"` on most Unix platforms, but it does not hurt to use `b` and it makes it more obvious that `"/data/tmp.log"` is a binary file, which the name does not imply. `int` is 32-bit long on the vast majority of Unix systems, but the size of `long` (64-bit in java) varies across ABIs, even on the same host (32 vs 64 bit mode). Elegance should become a second nature. – chqrlie Dec 11 '16 at 09:53