Big Endian or Small Endian while storing the data into Cassandra?

Question

I need to write Byte Array value into Cassandra using Java code. Then I will be having my C++ program to read that same Byte Array data from Cassandra.

That Byte Array is made up of three Byte Arrays as described below -

short schemaId = 32767;
long lastModifiedDate = "1379811105109L";
byte[] avroBinaryValue = os.toByteArray();

Now, I will write schemaId , lastModifiedDate and avroBinaryValue together into a single Byte Array and that resulting Byte Array, I will write back into Cassandra and then I will be having my C++ program which will retrieve that Byte Array data from Cassandra and then deserialize it to extract schemaId , lastModifiedDate and avroBinaryValue from it.

So now I am confuse whether I should be using Big Endian here in my Java code while writing to Cassandra? Or small Endian Byte Order here while storing the data into Cassandra?

Below is the code, I have got so far in Java which will serialize everything into a Single Byte Array...

public static void main(String[] args) throws Exception {

    String os = "whatever os is";
    byte[] avroBinaryValue = os.getBytes();

    long lastModifiedDate = 1379811105109L;
    short schemaId = 32767;

    ByteArrayOutputStream byteOsTest = new ByteArrayOutputStream();
    DataOutputStream outTest = new DataOutputStream(byteOsTest);

    outTest.writeShort(schemaId); // first write schemaId
    outTest.writeLong(lastModifiedDate); // second lastModifiedDate
    outTest.writeInt(avroBinaryValue.length); // then attributeLength
    outTest.write(avroBinaryValue); // then its value

    byte[] allWrittenBytesTest = byteOsTest.toByteArray();

    // write this allWrittenBytesTest into Cassandra

    // now deserialize it and extract everything from it
    DataInputStream inTest = new DataInputStream(new ByteArrayInputStream(allWrittenBytesTest));

    short schemaIdTest = inTest.readShort();

    long lastModifiedDateTest = inTest.readLong();

    int sizeAvroTest = inTest.readInt();
    byte[] avroBinaryValue1 = new byte[sizeAvroTest];
    inTest.read(avroBinaryValue1, 0, sizeAvroTest);


    System.out.println(schemaIdTest);
    System.out.println(lastModifiedDateTest);
    System.out.println(new String(avroBinaryValue1));

}

And I am also trying to see whether there is any efficient or proper way of doing this in Java as I need to retrieve this data from Cassandra using C++ program so I don't want to have any problem on C++ side as well.. So I am trying to make sure when I am writing this data to Cassandra from Java side, everything looks good..

Right now, for testing what I was doing is- I was writing this Byte Array into a file from Java program and I am reading that same file using C++ program and then deserializing that Byte Array accordingly..

I hope my question is clear enough.. Can anybody help me with this?

Are you aware [ByteBuffer](http://docs.oracle.com/javase/6/docs/api/java/nio/ByteBuffer.html) will allow you to specify big or little endian directly, then all you need to is ensure you decode it properly on the C++ side? [See this question](http://stackoverflow.com/questions/5625573/byte-array-to-short-array-and-back-again-in-java) for a similar, but not identical, example. — WhozCraig, Oct 01 '13 at 07:10
@WhozCraig: Thanks for the suggestion.. I was not aware o ByteBuffer at all.. I just went through it.. It looks like I can make my Java program much better by using ByteBuffer while writing into Cassandra? Right? And then I can use C++ program to specify which endian I need to follow on the C++ side.. Is it possible for you to give me an example basis on my above Java solution on how to do the same thing using ByteBuffer? It will be of great help to me.. Thanks.. — AKIWEB, Oct 01 '13 at 17:52
The linked question in my comment has a pretty good set of samples on how to do it with a `short`. You should be able to do the same with a 32-bit or 64-bit `int` or `long`. "Knowing" the value in the byte-stream is big-endian or little endian (personally I would prefer the former) considerably simplifies the C++ code side, and SO has numerous examples on reassembling one in-code, or you can use `ntohl()`. — WhozCraig, Oct 01 '13 at 18:36
@WhozCraig: I updated my question in which I am using ByteBuffer.. Can you please take a look and let me know I got the right thing? And also I am not sure how to deserialize if I am going with ByteBuffer route? — AKIWEB, Oct 01 '13 at 20:52

score 0 · Answer 1 · answered Oct 01 '13 at 05:58

0

Why not use a serailization framwework like google protobuf (http://code.google.com/p/protobuf/) this way you need not worry about the low level details and read and write back from any language and tools

answered Oct 01 '13 at 05:58

Pradheep

3,553
1
27
35

1

I cannot use it for some reason as I don't want to serialize twice as my actual value is Avro Binary Encoded value...And also I am not asking for evaluation of different serialization framework.. – AKIWEB Oct 01 '13 at 06:02
well the better way to do it will be a serialization framework as far as i know. Now that its out of question over here, it doesnot matter unless you are storing and reading from the same endianess.If you are going to read values from different endianess machines, then use the most common one so that the number of conversions is less. – Pradheep Oct 01 '13 at 06:14
1

Yeah I agree Pradheep.. But my actual value is Avro Binary Encoded value which is itself a Data Serialization format.. And I cannot Binary encode the data again with some other serialization format.. As I need to merge three Byte Arrays together into One... If I am using another Serialization format then I need to serialize/derserialize twice which is not what I am looking for.. – AKIWEB Oct 01 '13 at 06:23

Big Endian or Small Endian while storing the data into Cassandra?

1 Answers1