6

I'm doing kind of flac parser and I need to parse header of each frame. There is one field described in flac format (https://xiph.org/flac/format.html#frame_header)

if(variable blocksize) <8-56>:"UTF-8" coded sample number (decoded number is 36 bits) [4] else <8-48>:"UTF-8" coded frame number (decoded number is 31 bits) [4]

and [4] says:

The "UTF-8" coding used for the sample/frame number is the same variable length code used to store compressed UCS-2, extended to handle larger input.

I absolutely can't understand how should I know the size of this field if it is 8-56 or 8-48 bits. Why then decoded number is 36 or 31 bits? Also when I open flac file in hex editor whith UTF-8 encoding there are no numbers in this field. I will be very grateful for any help.

live2
  • 3,771
  • 2
  • 37
  • 46
Voice1081
  • 63
  • 5

1 Answers1

7

UTF-8 coded sample means that the first byte (5. byte of a frame header) contains significant bits which will tell you how many following bytes are part of frame/sample number.

You can check format of significant bits here: https://en.wikipedia.org/wiki/UTF-8

If number is coded with 48 bits, it looks like this: 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx

It has 31 bit ('x') which you can extract and put into more manageable type, like UInt32.

Mijo
  • 304
  • 3
  • 5
  • So if the first byte is ```0b11111110```, I could read six more bytes and reconstruct it like this? (ommiting the first byte due to no room being left after the 0) `codedNumber = ((codedNumberBytes[1] & 0x3f) << 30) // 00111111 | ((codedNumberBytes[2] & 0x3f) << 24) | ((codedNumberBytes[3] & 0x3f) << 18) | ((codedNumberBytes[4] & 0x3f) << 12) | ((codedNumberBytes[5] & 0x3f) << 6) | (codedNumberBytes[6] & 0x3f);` – Edward Eddy67716 Dec 28 '21 at 09:35