6

I have a python list of integers, and I'd like to know how much space it will take up when encoded as a sequence of Protocol Buffers variable-length integers, or varints. What's the best way to figure this out without actually encoding the integers?

my_numbers = [20, 69, 500, 38987982344444, 420, 99, 1, 999]
e = MyCoolVarintArrayEncoder(my_numbers)
print(len(e))  # ???
Emmett Butler
  • 5,969
  • 2
  • 29
  • 47

1 Answers1

5

Each integer is encoded in base 128, one byte per "digit". The length of an integer value's representation in any base is ceil(log(value, base)).

Take the log(base=128) of each integer; round those values up to the nearest integer; sum those rounded values, and there's your length.

Emmett Butler
  • 5,969
  • 2
  • 29
  • 47
Prune
  • 76,765
  • 14
  • 60
  • 81
  • 2
    great answer, but also to note: this is true for positive integers; negative numbers are always... what, 10 bytes? – Marc Gravell Jul 30 '18 at 23:28
  • 1
    Oops, sorry; this is about the protobuf `varint`, not the Python `int`; but yeah, for signed int32 or int64 converted to varint, negative numbers are always ten bytes long, as @MarcGravell said. – abarnert Jul 30 '18 at 23:42
  • Using `Math.Log` in C#, the log (base 128) of 0 returns -8, and log (base 128) of 1 returns 0. Hence I think you would need to special case both of those values as well, to return a minimum length of 1. (And yes, I know this is a Python question, but maybe Python's log function gives the same values). – Steven Rands Sep 03 '21 at 11:27