Has anybody some advice on programming realtime audio synthesis?

Question

I'm currently working on a personal project: creating a library for realtime audio synthesis in Flash. In short: tools to connect wavegenarators, filters, mixers, etc with eachother and supply the soundcard with raw (realtime) data. Something like max/msp or Reaktor.

I already have some working stuff, but I'm wondering if the basic setup that I wrote is right. I don't want to run into problems later on that force me to change the core of my app (although that can always happen).

Basically, what I do now is start at the end of the chain, at the place where the (raw) sounddata goes 'out' (to the soundcard). To do that, I need to write chunks of bytes (ByteArrays) to an object, and to get that chunk I ask whatever module is connected to my 'Sound Out' module to give me his chunk. That module does the same request to the module that's connected to his input, and that keeps happening until the start of the chain is reached.

Is this the right approach? I can imagine running into problems if there's a feedbackloop, or if there's another module with no output: if i were to connect a spectrumanalyzer somewhere, that would be a dead end in the chain (a module with no outputs, just an input). In my current setup, such a module wouldnt work because i only start calculating from the sound-output module.

Has anyone experience with programming something like this? I'd be very interested in some thoughts about the right approach. (For clarity: i'm not looking for specific Flash-implementations, and that's why i didnt tag this question under flash or actionscript)

score 1 · Answer 1 · answered Nov 30 '10 at 18:29

I did a similar thing a while back, and I used the same approach as you do - start at the virtual line out, and trace the signal back to the top. I did this per sample though, not per buffer; if I were to write the same application today, I might choose per-buffer instead though, because I suspect it would perform better.

The spectrometer was designed as an insert module, that is, it would only work if both its input and its output were connected, and it would pass its input to the output unchanged.

To handle feedback, I had a special helper module that introduced a 1-sample delay and would only fetch its input once per cycle.

Also, I think doing all your internal processing with floats, and thus arrays of floats as the buffers, would be a lot easier than byte arrays, and it would save you the extra effort of converting between integers and floats all the time.

score 0 · Answer 2 · answered Nov 30 '10 at 22:15

In later versions you may have different packet rates in different parts of your network.

One example would be if you extend it to transfer data to or from disk. Another example would be that low data rate control variables such as one controlling echo-delay may, later, become a part of your network. You probably don't want to process control variables with the same frequency that you process audio packets, but they are still 'real time' and part of the function network. They may for example need smoothing to avoid sudden transitions.

As long as you are calling all your functions at the same rate, and all the functions are essentially taking constant-time, your pull-the-data approach will work fine. There will be little to choose between pulling data and pushing. Pulling is somewhat more natural for playing audio, pushing is somewhat more natural for recording, but either works and ends up making the same calls to the underlying audio processing functions.

For the spectrometer you've got the issue of multiple sinks for data, but it is not a problem. Introduce a dummy link to it from the real sink. The dummy link can cause a request for data that is not honoured. As long as the dummy link knows it is a dummy and does not care about the lack of data, everything will be OK. This is a standard technique for reducing multiple sinks or sources to a single one.
With this kind of network you do not want to do the same calculation twice in one complete update. For example if you mix a high-passed and low-passed version of a signal you do not want to evaluate the original signal twice. You must do something like record a timer tick value with each buffer, and stop propagation of pulls when you see the current tick value is already present. This same mechanism will also protect you against feedback loops in evaluation.

So, those two issues of concern to you are easily addressed within your current framework.

Rate matching where there are different packet rates in different parts of the network is where the problems with the current approach will start. If you are writing audio to disk then for efficiency you'll want to write large chunks infrequently. You don't want to be blocking your servicing of the more frequent small audio input and output processing packets during those writes. A single rate pulling or pushing strategy on its own won't be enough.

Just accept that at some point you may need a more sophisticated way of updating than a single rate network. When that happens you'll need threads for the different rates that are running, or you'll write your own simple scheduler, possibly as simple as calling less frequently evaluated functions one time in n, to make the rates match. You don't need to plan ahead for this. Your audio functions are almost certainly already delegating responsibility for ensuring their input buffers are ready to other functions, and it will only be those other functions that need to change, not the audio functions themselves.

The one thing I would advise at this stage is to be careful to centralise audio buffer allocation, noticing that buffers are like fenceposts. They don't belong to an audio function, they lie between the audio functions. Centralising the buffer allocation will make it easy to retrospectively modify the update strategy for different rates in different parts of the network.

Has anybody some advice on programming realtime audio synthesis?

2 Answers2