0

I'm trying to send an array list of 1000 long numbers with 19 digits using protobuf. I have used repeated fixed64 as the datatype as it is recommended for numbers often bigger than 2^56 and it is using fixed encoding of 8 bytes. My proto format is

message ListLong {
    repeated fixed64 value = 1;
}

I created the proto object using the following method.

static ListLong createListOfLong(List<Long> longList) {
    ListLong.Builder list = ListLong.newBuilder();
    for(long value : longList) {
        list.addValue(value);
    }
    return list.build();
}

I serialized this message of 1000 long values and converted to byte array. The size of byte array is 8004 bytes.I measured the processing time, CPU usage time and memory allocated using the following methods before and after object creation and serialization.

ThreadMXBean tmx = (ThreadMXBean) ManagementFactory.getThreadMXBean();
Long cpuStartTime = tmx.getCurrentThreadCpuTime();
Long startMemory = tmx.getThreadAllocatedBytes(Thread.currentThread().getId());
Long startTime = System.currentTimeMillis();
Long protoList = createListOfLong(collectionList);
System.out.println("Cpu usage time for proto creation  "+(tmx.getCurrentThreadCpuTime()-cpuStartTime)/1000000+"ms");
System.out.println("Heap Memory usage for proto creation  "+(tmx.getThreadAllocatedBytes(Thread.currentThread().getId())-startMemory)/1000+"KB");
System.out.println("object creation time is "+(System.currentTimeMillis()-startTime)+" ms");
startMemory = tmx.getThreadAllocatedBytes(Thread.currentThread().getId());
cpuStartTime = tmx.getCurrentThreadCpuTime();
startTime = System.currentTimeMillis();
byte[] b = protoList.toByteArray();
System.out.println("Heap Memory usage for serializing  "+(tmx.getThreadAllocatedBytes(Thread.currentThread().getId())-startMemory)/1000+"KB");
System.out.println("Serialized time is "+(System.currentTimeMillis()-startTime)+" ms");
System.out.println("Cpu usage time for serializing  "+(tmx.getCurrentThreadCpuTime()-cpuStartTime)/1000000+"ms");

I got the following parameters when using in intellij ide.

Cpu usage time for proto creation  54ms
Heap Memory usage for proto creation  5756KB
object creation time is 56 ms
Heap Memory usage for serializing  3090KB
Serialized time is 29 ms
Cpu usage time for serializing  28ms
Serialized length is is 8004

My questions are

  1. Is there any alternate message format to send an ArrayList of long with 19 digits?

  2. Can we achieve a byte array length less than 8k by any way?

  3. Why protobuf object creation takes much processing time and memory?

  4. Is there any way to reduce the processing time and memory allocated?

Bashir
  • 2,057
  • 5
  • 19
  • 44
  • 1
    1) Yes, but why do you need a different one? 2) Most likely, no, If the array contains random 64-bit integers. If the values are not random - may be. Depending on the nature and properties of this list, it can be probably compressed into less than 8K. But I can't tell for sure without knowing how your list looks like. 3) Because you [measure it wrong](https://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java). 4) What are the requirements? – apangin Apr 20 '20 at 14:07
  • i need to know whether alternate method could achieve a byte array less than 8k and is there any compression in protobuf. My list contains numbers like 6249000000000000000, 6249000000000000123, 6249000000000000243, 6249000000000002331, and so on... can you suggest a compression for such numbers and the list is not sorted... – Dhayalan RMoorthy Apr 20 '20 at 16:36
  • If the neighbour numbers are close to each other (seems like your case), you can use delta encoding and store deltas as Varint. – apangin Apr 20 '20 at 19:48
  • Is there any other way to convert the arraylist of 1000 longs into byte array less than 8k without using protobuf – Dhayalan RMoorthy Apr 21 '20 at 05:10
  • I've just told above: store deltas between neighbour elements using varint encoding. – apangin Apr 21 '20 at 08:11
  • If you are using proto2, specifying packed=true in the field can save you a considerable amount of storage: https://developers.google.com/protocol-buffers/docs/proto#specifying-field-rules – Rafi Kamal Apr 26 '20 at 05:47
  • in proto3, the repeated fields are packed by default : https://developers.google.com/protocol-buffers/docs/encoding#packed – Dhayalan RMoorthy May 04 '20 at 06:16

0 Answers0