7

I'm using the WebCodecs AudioDecoder to decode OGG files (vorbis and opus). The codec string setting in the AudioDecoder configuration is vorbis and opus, respectively.

I have the container parsed into pages, and the AudioDecoder is almost ready for work.

However, I'm unable to figure out the description field it's expecting. I've read up on Vorbis WebCodecs Registration, but I'm still lost. That is:

let decoder = new AudioDecoder({ ... });

decoder.configure({
  description: "", // <----- What do I put here?
  codec: "vorbis",
  sampleRate: 44100,
  numberOfChannels: 2,
});

Edit: I understand it's expecting key information about how the OGG file is structured. What I don't understand is what goes there exactly. How does the string even look? Is it a dot-separated string of arguments?

John Weisz
  • 30,137
  • 13
  • 89
  • 132
  • It's type of ArrayBuffer containing the data as described [here](https://xiph.org/vorbis/doc/Vorbis_I_spec.html#x1-610004.2). I guess [this page](https://xiph.org/vorbis/doc/framing.html) contains the structural information. As it is an ArrayBuffer, e.g. an array of bytes, there isn't much choise regarding how the data is segmented / structured. Bytes are expected to be in certain order and I believe the segment_table part on the xiph webpage should tell how the data is segmented. – Swiffy Jun 26 '22 at 19:54
  • Reading your issue on the codec-parser library it seems you' re after the PCM audio data of these files. Any reason you don't use the Web Audio API's `decodeAudioData()` for this? – Kaiido Jul 01 '22 at 02:24
  • @Kaiido Good observation, but I have 3 very good reasons: (1) no support for partial decoding (meaning you can't decode only a part of a file, which leads to high memory use), and (2) no direct access from worker threads, meaning you have to schedule audio decoding from the main thread, and (3) Chromium-based runtimes actually allocate the final `AudioBuffer` on the main thread, causing a brief main thread block with even slightly longer files (a minute or longer or so). – John Weisz Jul 02 '22 at 11:17

1 Answers1

5

https://www.w3.org/TR/webcodecs-vorbis-codec-registration/#audiodecoderconfig-description

AudioDecoderConfig.description is required. It is assumed to be in Xiph extradata format, described in [OGG-FRAMING]. This format consists in the page_segments field, followed by the segment_table field, followed by the three Vorbis header packets, respectively the identification header, the comments header, and the setup header, in this order, as described in section 4.2 of [VORBIS].

https://www.w3.org/TR/webcodecs-opus-codec-registration/#audiodecoderconfig-description

AudioDecoderConfig.description can optionally set to an Identification Header, described in section 5.1 of [OPUS-IN-OGG].

If an AudioDecoderConfig.description has been set, the bistream is assumed to be in ogg format.

If an AudioDecoderConfig.description has not been set, the bitstream is assumed to be in opus format.

If you want a good explanation of how the OGG/Opus header is structured, [OPUS-IN-OGG] is quite instructive.

The OGG/Vorbis header is a bit more vague, there is no documentation on what Xiph extra-data is, so one can only trust the W3 docs on how it is structured, and compare to the OGG/Vorbis docs on the fields ([OGG-FRAMING]).

Essentially, you need to provide the decoder with the relevant binary data headers for the file you are decoding, as ArrayBuffer, TypedArray, or DataView. You can get this from the binary file contents you are decoding.

Unfortunately, to get at this data, you will likely need to parse the format of the underlying OGG container. The WebCodecs API is intended for low-level use, that is, for handling codecs, not the containers themselves. See this GitHub issue where someone runs into similar issues regarding descriptions, and is told to parse the container themselves. Parsing the container is outside of the scope of this API.

Perhaps you could use an external OGG parsing library, or opt for a higher level audio processing class like the WebAudio API or a WebAssembly library?

UPDATE:

To clarify on what should go into a description, the description field is passed directly to FFmpeg's extradata in Chromium.

The docs specify that for OGG/Opus, you should set the description to be the contents of the 0th page, that is, the identification header (in binary).

For OGG/Vorbis, the documentation is pretty bad, and quite vague. I'll be checking the FFmpeg source for this. It seems to be the identification header, followed by the setup header (as the third header, so the non-optional comment header would be inbetween)

So, to summarise what should go in the description field, you should put the binary contents of the headers of the relevant codec. For OGG/Opus, you would provide the binary contents of the first page (the identification header) For OGG/Vorbis, you would provide the binary contents of the first three packets (the identification header, comment header, and setup header).

The documentation suggests codec-parser provides the data for the headers as OpusHeader.data and VorbisHeader.{data,comments,setup}.

Try concatenating the three together, for Vorbis, and see if that works. (note that comments and setup are not initialized at the same time as data)

// opus
let desc = hdr.data;

// vorbis
let desc = new Uint8Array(hdr.data.length + hdr.comments.length + hdr.setup.length);
desc.set(hdr.data);
desc.set(hdr.comments, hdr.data.length);
desc.set(hdr.setup, hdr.data.length + hdr.comments.length);
AlexApps99
  • 3,506
  • 9
  • 21
  • Thanks for your answer. I already have the container parsed (i.e. I have the OGG pages), as I mentioned in the question. What I have difficulty finding out is how to construct the `description` string. If it helps, I'm using [codec-parser](https://github.com/eshaz/codec-parser) for doing this. – John Weisz Jun 28 '22 at 18:07
  • 1
    I have updated the answer with additional information, I hope this helps – AlexApps99 Jun 28 '22 at 23:46
  • Thank you for your amazingly detailed answer, I already have Opus working nicely thanks to it. Still working on Vorbis. Checking https://www.w3.org/TR/webcodecs-vorbis-codec-registration/#audiodecoderconfig-description it seems it's expecting 5 pieces: `page_segments`, `segment_table` and _then_ `data`, `comments`, `setup`. – John Weisz Jun 30 '22 at 15:02
  • You're right, I forgot to mention that. `page_segments` and `segment_table` likely correspond to what FFmpeg decodes in `avpriv_split_xiph_headers`. – AlexApps99 Jun 30 '22 at 23:37