9

I've trying to mix together 2 16bit linear PCM audio streams and I can't seem to overcome the noise issues. I think they are coming from overflow when mixing samples together.

I have following function ...

short int mix_sample(short int sample1, short int sample2)
{
    return #mixing_algorithm#;
}

... and here's what I have tried as #mixing_algorithm#

sample1/2 + sample2/2
2*(sample1 + sample2) - 2*(sample1*sample2) - 65535
(sample1 + sample2) - sample1*sample2
(sample1 + sample2) - sample1*sample2 - 65535
(sample1 + sample2) - ((sample1*sample2) >> 0x10) // same as divide by 65535

Some of them have produced better results than others but even the best result contained quite a lot of noise.

Any ideas how to solve it?

Clifford
  • 88,407
  • 13
  • 85
  • 165
Ragnar
  • 1,387
  • 4
  • 20
  • 29

6 Answers6

13

The best solution I have found is given by Viktor Toth. He provides a solution for 8-bit unsigned PCM, and changing that for 16-bit signed PCM, produces this:

int a = 111; // first sample (-32768..32767)
int b = 222; // second sample
int m; // mixed result will go here

// Make both samples unsigned (0..65535)
a += 32768;
b += 32768;

// Pick the equation
if ((a < 32768) || (b < 32768)) {
    // Viktor's first equation when both sources are "quiet"
    // (i.e. less than middle of the dynamic range)
    m = a * b / 32768;
} else {
    // Viktor's second equation when one or both sources are loud
    m = 2 * (a + b) - (a * b) / 32768 - 65536;
}

// Output is unsigned (0..65536) so convert back to signed (-32768..32767)
if (m == 65536) m = 65535;
m -= 32768;

Using this algorithm means there is almost no need to clip the output as it is only one value short of being within range. Unlike straight averaging, the volume of one source is not reduced even when the other source is silent.

Malvineous
  • 25,144
  • 16
  • 116
  • 151
  • What do you mean by "quiet"? - that would normally be mean *low magnitude* (*near* the middle), but here you appear to mean *negative* (below the middle), whereas the "loud" equation is executed when *one or both are positive* (before shifting - i.e. adding a DC bias)). Apart from that *volume* is a perception of the *signal*, not an individual sample - a "loud" sound will have samples across the entire range. – Clifford Aug 03 '14 at 07:28
  • @Clifford: Middle being the middle of the available range, so if the values are between 0 and 65535, then the middle is 32767. It is better explained at the link to Viktor Toth's page. – Malvineous Aug 03 '14 at 07:30
  • I realise that - my question was rhetorical - the terms "quiet" and "loud" are inaccurate and misleading in this context. – Clifford Aug 03 '14 at 07:32
  • Which is exactly why I put "quiet" in scare quotes, to hint that the meaning is a little different to what you might expect :-) Plus I then followed it with an explanation of what I meant... – Malvineous Aug 03 '14 at 07:36
  • 1
    In the original explanation it is about relationship to the mid-point; the term "quiet" is used differently and correctly there to mean "close to the midpoint". Although this is the best answer IMO (hence the up-vote), the comments are a misrepresentation of Victor Toth's explanation. – Clifford Aug 03 '14 at 07:57
8

here's a descriptive implementation:

short int mix_sample(short int sample1, short int sample2) {
    const int32_t result(static_cast<int32_t>(sample1) + static_cast<int32_t>(sample2));
    typedef std::numeric_limits<short int> Range;
    if (Range::max() < result)
        return Range::max();
    else if (Range::min() > result)
        return Range::min();
    else
        return result;
}

to mix, it's just add and clip!

to avoid clipping artifacts, you will want to use saturation or a limiter. ideally, you will have a small int32_t buffer with a small amount of lookahead. this will introduce latency.

more common than limiting everywhere, is to leave a few bits' worth of 'headroom' in your signal.

justin
  • 104,054
  • 14
  • 179
  • 226
  • 2
    The only "correct" way to avoid clipping is to divide by two. There is some illustrative code here in the "Distortion and Noise" Section: http://blog.bjornroche.com/2013/05/the-abcs-of-pcm-uncompressed-digital.html – Bjorn Roche Aug 23 '13 at 12:40
  • 1
    Have to downvote this because it only solves the 'local' issue of mixing a single sample. If you look at a big soundwave, this is actually a horrible algorithm, since it will cut off high amplitude waves and introduces clipping noise. One proper way is to use float samples and smoothly apply dynamic wave amplitude compression. This will ensure no artificial clipping occurs - the sound will just get quieter during high amplitudes. – Jorma Rebane Oct 04 '16 at 08:00
  • @JormaRebane Do you systematically downvote answers to beginners' questions on every subject? – justin Oct 13 '16 at 07:17
  • 1
    The divide by two method halves the volume of the output when one signal is silent. Probably not what one wants. – Tony Jul 11 '18 at 02:55
2

Here is what I did on my recent synthesizer project.

int* unfiltered = (int *)malloc(lengthOfLongPcmInShorts*4);
int i;
for(i = 0; i < lengthOfShortPcmInShorts; i++){
    unfiltered[i] = shortPcm[i] + longPcm[i];
}
for(; i < lengthOfLongPcmInShorts; i++){
     unfiltered[i] = longPcm[i];
}

int max = 0;
for(int i = 0; i < lengthOfLongPcmInShorts; i++){
   int val = unfiltered[i];
   if(abs(val) > max)
      max = val;
}

short int *newPcm = (short int *)malloc(lengthOfLongPcmInShorts*2);
for(int i = 0; i < lengthOfLongPcmInShorts; i++){
   newPcm[i] = (unfilted[i]/max) * MAX_SHRT;
}

I added all the PCM data into an integer array, so that I get all the data unfiltered.

After doing that I looked for the absolute max value in the integer array.

Finally, I took the integer array and put it into a short int array by taking each element dividing by that max value and then multiplying by the max short int value.

This way you get the minimum amount of 'headroom' needed to fit the data.

You might be able to do some statistics on the integer array and integrate some clipping, but for what I needed the minimum amount of headroom was good enough for me.

Nir
  • 31
  • 3
1

There's a discussion here: https://dsp.stackexchange.com/questions/3581/algorithms-to-mix-audio-signals-without-clipping about why the A+B - A*B solution is not ideal. Hidden down in one of the comments on this discussion is the suggestion to sum the values and divide by the square root of the number of signals. And an additional check for clipping couldn't hurt. This seems like a reasonable (simple and fast) middle ground.

Ken
  • 172
  • 7
0

I think they should be functions mapping [MIN_SHORT, MAX_SHORT] -> [MIN_SHORT, MAX_SHORT] and they are clearly not (besides first one), so overflows occurs.

If unwind's proposition won't work you can also try:

((long int)(sample1) + sample2) / 2
Pawel Zubrycki
  • 2,703
  • 17
  • 26
  • While adding the signals is correct; with simple *normalisation* to maintain range, one signal will affect the other undesirably. For example if `sample1` is always zero (silent), you would want *only* `sample2`, but you get `sample2 / 2` - i.e. the output is quieter. – Clifford Aug 03 '14 at 08:01
  • Yes, you are totally right. But solves the problem of overflow and clipping. The best solution IMHO would be to scale the signals depending on their value, like `w(s1,s2)*s1 + (1-w(s1,s2))*s2` where `w(s1,s2)` is some function where `w(s1,0) = 1`, `w(0,s2) = 0` and `0 < w(s1,s2) < 1` when `s1 != 0 && s2 != 0` – Pawel Zubrycki Dec 20 '14 at 10:51
-1

Since you are in time domain the frequency info is in the difference between successive samples, when you divide by two you damage that information. That's why adding and clipping works better. Clipping will of course add very high frequency noise which is probably filtered out.

dizzy
  • 11
  • I expect the noise the OP is hearing is caused by the values wrapping, rather than anything as subtle as a single bit of lost resolution – Will Jun 03 '14 at 12:42