1

I couldn't find data size limit information on Avro spec.

What is the max size limit for each primitive & complex field? What is the max size limit per data block or object?

Some of of the field size limitations may depend of the language we use, for example, Java has size limit on String object.

Kevin Si
  • 108
  • 1
  • 7
  • Well, Strings hold UTF8 data prefixed by an long type, so the max string should be limited by the max long. Floats, doubles have a fixed size, bytes and arrays are fixed by the max long. See the binary spec https://avro.apache.org/docs/1.8.2/spec.html#binary_encoding – OneCricketeer Aug 19 '18 at 15:53
  • Thanks! I missed that part: bytes are encoded as a long followed by that many bytes of data. a string is encoded as a long followed by that many bytes of UTF-8 encoded character data. – Kevin Si Aug 20 '18 at 20:34
  • Yeah, with the caveat is that a long uses "variable-length zig-zag coding", and you can click on those links there for more info on that – OneCricketeer Aug 20 '18 at 21:46

1 Answers1

2

In the Avro spec, arrays, strings, maps in general all are limited on a "variable length" "zig zag" coded long.

And a varint has no set size, as far as I know as the byte reader keeps reading the high-order bit

A variable-length format for positive integers is defined where the high-order bit of each byte indicates whether more bytes remain to be read.

While I don't know the max limit on that, even if we just base the logic purely on Java longs, then

Long.MAX_VALUE =  9223372036854775807

and that in bytes is over 9.22 thousand Petabytes (or Exabytes), so I think you should be fine.

On the Java side of things, though, (and most other languages that have an integer-sized string-type) Strings have a much smaller size limit

How many characters can a Java String have?

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245