Mixing 16 bit linear PCM streams and avoiding clipping/overflow

Question

I've trying to mix together 2 16bit linear PCM audio streams and I can't seem to overcome the noise issues. I think they are coming from overflow when mixing samples together.

I have following function ...

short int mix_sample(short int sample1, short int sample2)
{
    return #mixing_algorithm#;
}

... and here's what I have tried as #mixing_algorithm#

sample1/2 + sample2/2
2*(sample1 + sample2) - 2*(sample1*sample2) - 65535
(sample1 + sample2) - sample1*sample2
(sample1 + sample2) - sample1*sample2 - 65535
(sample1 + sample2) - ((sample1*sample2) >> 0x10) // same as divide by 65535

Some of them have produced better results than others but even the best result contained quite a lot of noise.

Any ideas how to solve it?

can you write the full algorithm,I can't see any assignments!! — perilbrain, Aug 23 '12 at 10:37
When you divide sample1 and sample2 by 2, you get error range of 1. — huseyin tugrul buyukisik, Aug 23 '12 at 10:38

Malvineous · Answer 1 · 2014-08-03T07:23:27.957

13

The best solution I have found is given by Viktor Toth. He provides a solution for 8-bit unsigned PCM, and changing that for 16-bit signed PCM, produces this:

int a = 111; // first sample (-32768..32767)
int b = 222; // second sample
int m; // mixed result will go here

// Make both samples unsigned (0..65535)
a += 32768;
b += 32768;

// Pick the equation
if ((a < 32768) || (b < 32768)) {
    // Viktor's first equation when both sources are "quiet"
    // (i.e. less than middle of the dynamic range)
    m = a * b / 32768;
} else {
    // Viktor's second equation when one or both sources are loud
    m = 2 * (a + b) - (a * b) / 32768 - 65536;
}

// Output is unsigned (0..65536) so convert back to signed (-32768..32767)
if (m == 65536) m = 65535;
m -= 32768;

Using this algorithm means there is almost no need to clip the output as it is only one value short of being within range. Unlike straight averaging, the volume of one source is not reduced even when the other source is silent.

edited Aug 03 '14 at 07:23

answered Aug 03 '14 at 06:44

Malvineous

25,144
16
116
151

What do you mean by "quiet"? - that would normally be mean *low magnitude* (*near* the middle), but here you appear to mean *negative* (below the middle), whereas the "loud" equation is executed when *one or both are positive* (before shifting - i.e. adding a DC bias)). Apart from that *volume* is a perception of the *signal*, not an individual sample - a "loud" sound will have samples across the entire range. – Clifford Aug 03 '14 at 07:28
@Clifford: Middle being the middle of the available range, so if the values are between 0 and 65535, then the middle is 32767. It is better explained at the link to Viktor Toth's page. – Malvineous Aug 03 '14 at 07:30
I realise that - my question was rhetorical - the terms "quiet" and "loud" are inaccurate and misleading in this context. – Clifford Aug 03 '14 at 07:32
Which is exactly why I put "quiet" in scare quotes, to hint that the meaning is a little different to what you might expect :-) Plus I then followed it with an explanation of what I meant... – Malvineous Aug 03 '14 at 07:36
1

In the original explanation it is about relationship to the mid-point; the term "quiet" is used differently and correctly there to mean "close to the midpoint". Although this is the best answer IMO (hence the up-vote), the comments are a misrepresentation of Victor Toth's explanation. – Clifford Aug 03 '14 at 07:57

justin · Accepted Answer · 2012-08-23T11:32:31.297

8

here's a descriptive implementation:

short int mix_sample(short int sample1, short int sample2) {
    const int32_t result(static_cast<int32_t>(sample1) + static_cast<int32_t>(sample2));
    typedef std::numeric_limits<short int> Range;
    if (Range::max() < result)
        return Range::max();
    else if (Range::min() > result)
        return Range::min();
    else
        return result;
}

to mix, it's just add and clip!

to avoid clipping artifacts, you will want to use saturation or a limiter. ideally, you will have a small int32_t buffer with a small amount of lookahead. this will introduce latency.

more common than limiting everywhere, is to leave a few bits' worth of 'headroom' in your signal.

edited Aug 23 '12 at 11:32

answered Aug 23 '12 at 11:24

justin

104,054
14
179
226

2

The only "correct" way to avoid clipping is to divide by two. There is some illustrative code here in the "Distortion and Noise" Section: http://blog.bjornroche.com/2013/05/the-abcs-of-pcm-uncompressed-digital.html – Bjorn Roche Aug 23 '13 at 12:40
1

Have to downvote this because it only solves the 'local' issue of mixing a single sample. If you look at a big soundwave, this is actually a horrible algorithm, since it will cut off high amplitude waves and introduces clipping noise. One proper way is to use float samples and smoothly apply dynamic wave amplitude compression. This will ensure no artificial clipping occurs - the sound will just get quieter during high amplitudes. – Jorma Rebane Oct 04 '16 at 08:00
@JormaRebane Do you systematically downvote answers to beginners' questions on every subject? – justin Oct 13 '16 at 07:17
1

The divide by two method halves the volume of the output when one signal is silent. Probably not what one wants. – Tony Jul 11 '18 at 02:55

Nir · Answer 3 · 2014-11-18T18:13:02.823

Here is what I did on my recent synthesizer project.

int* unfiltered = (int *)malloc(lengthOfLongPcmInShorts*4);
int i;
for(i = 0; i < lengthOfShortPcmInShorts; i++){
    unfiltered[i] = shortPcm[i] + longPcm[i];
}
for(; i < lengthOfLongPcmInShorts; i++){
     unfiltered[i] = longPcm[i];
}

int max = 0;
for(int i = 0; i < lengthOfLongPcmInShorts; i++){
   int val = unfiltered[i];
   if(abs(val) > max)
      max = val;
}

short int *newPcm = (short int *)malloc(lengthOfLongPcmInShorts*2);
for(int i = 0; i < lengthOfLongPcmInShorts; i++){
   newPcm[i] = (unfilted[i]/max) * MAX_SHRT;
}

I added all the PCM data into an integer array, so that I get all the data unfiltered.

After doing that I looked for the absolute max value in the integer array.

Finally, I took the integer array and put it into a short int array by taking each element dividing by that max value and then multiplying by the max short int value.

This way you get the minimum amount of 'headroom' needed to fit the data.

You might be able to do some statistics on the integer array and integrate some clipping, but for what I needed the minimum amount of headroom was good enough for me.

score 1 · Answer 4 · answered Apr 02 '20 at 13:50

There's a discussion here: https://dsp.stackexchange.com/questions/3581/algorithms-to-mix-audio-signals-without-clipping about why the A+B - A*B solution is not ideal. Hidden down in one of the comments on this discussion is the suggestion to sum the values and divide by the square root of the number of signals. And an additional check for clipping couldn't hurt. This seems like a reasonable (simple and fast) middle ground.

score 0 · Answer 5 · answered Aug 23 '12 at 10:46

0

I think they should be functions mapping [MIN_SHORT, MAX_SHORT] -> [MIN_SHORT, MAX_SHORT] and they are clearly not (besides first one), so overflows occurs.

If unwind's proposition won't work you can also try:

((long int)(sample1) + sample2) / 2

answered Aug 23 '12 at 10:46

Pawel Zubrycki

2,703
17
26

While adding the signals is correct; with simple *normalisation* to maintain range, one signal will affect the other undesirably. For example if `sample1` is always zero (silent), you would want *only* `sample2`, but you get `sample2 / 2` - i.e. the output is quieter. – Clifford Aug 03 '14 at 08:01
Yes, you are totally right. But solves the problem of overflow and clipping. The best solution IMHO would be to scale the signals depending on their value, like `w(s1,s2)*s1 + (1-w(s1,s2))*s2` where `w(s1,s2)` is some function where `w(s1,0) = 1`, `w(0,s2) = 0` and `0 < w(s1,s2) < 1` when `s1 != 0 && s2 != 0` – Pawel Zubrycki Dec 20 '14 at 10:51

score -1 · Answer 6 · answered Apr 23 '13 at 00:43

-1

Since you are in time domain the frequency info is in the difference between successive samples, when you divide by two you damage that information. That's why adding and clipping works better. Clipping will of course add very high frequency noise which is probably filtered out.

answered Apr 23 '13 at 00:43

dizzy

11

I expect the noise the OP is hearing is caused by the values wrapping, rather than anything as subtle as a single bit of lost resolution – Will Jun 03 '14 at 12:42

Mixing 16 bit linear PCM streams and avoiding clipping/overflow

6 Answers6

Linked