35

I have two wav files that I want to mix together to form one wav file. They are both the same samples format etc...

Been searching google endlessly.

I would prefer to do it using the wave module in python.

How can this be done?

Marco Cerliani
  • 21,233
  • 3
  • 49
  • 54
james
  • 359
  • 1
  • 3
  • 4

6 Answers6

41

You can use the pydub library (a light wrapper I wrote around the python wave module in the std lib) to do it pretty simply:

from pydub import AudioSegment

sound1 = AudioSegment.from_file("/path/to/my_sound.wav")
sound2 = AudioSegment.from_file("/path/to/another_sound.wav")

combined = sound1.overlay(sound2)

combined.export("/path/to/combined.wav", format='wav')
Jiaaro
  • 74,485
  • 42
  • 169
  • 190
  • 1
    When I tried to do this only the sound that I was overlaying a different sound on was being heard. Is there a way to get around that? – ArjunSahlot Jan 05 '21 at 01:13
10

A python solution which requires both numpy and audiolab, but is fast and simple:

import numpy as np
from scikits.audiolab import wavread

data1, fs1, enc1 = wavread("file1.wav")
data2, fs2, enc2 = wavread("file2.wav")

assert fs1 == fs2
assert enc1 == enc2
result = 0.5 * data1 + 0.5 * data2

If sampling rate (fs*) or encoding (enc*) are different, you may need some audio processing (the assert are strictly speaking too strong, as wavread can handle some cases transparantly).

David Cournapeau
  • 78,318
  • 8
  • 63
  • 70
  • Shouldn't you `assert len(data1) == len(data2)` too? And you could also use `scipy` to read a wave file: `from scipy.io import wavfile` and then `fs, data = wavfile.read("file.wav")`. – Lewistrick Jan 23 '19 at 10:14
10

LIBROSA SOLUTION

import librosa
import IPython as ip

y1, sample_rate1 = librosa.load(audio1, mono=True)
y2, sample_rate2 = librosa.load(audio2, mono=True)

# MERGE
librosa.display.waveplot((y1+y2)/2, sr=int((sample_rate1+sample_rate2)/2))

# REPRODUCE
ip.display.Audio((y1+y2)/2, rate=int((sample_rate1+sample_rate2)/2))
Marco Cerliani
  • 21,233
  • 3
  • 49
  • 54
8

You guys like numpy, no? Below is a solution that depends on wave and numpy. Raw bytes in two files './file1.wav' and './file2.wav' are added. It's probably good to apply np.clip to mix before converting back to int-16 (not included).

import wave
import numpy as np
# load two files you'd like to mix
fnames =["./file1.wav", "./file2.wav"]
wavs = [wave.open(fn) for fn in fnames]
frames = [w.readframes(w.getnframes()) for w in wavs]
# here's efficient numpy conversion of the raw byte buffers
# '<i2' is a little-endian two-byte integer.
samples = [np.frombuffer(f, dtype='<i2') for f in frames]
samples = [samp.astype(np.float64) for samp in samples]
# mix as much as possible
n = min(map(len, samples))
mix = samples[0][:n] + samples[1][:n]
# Save the result
mix_wav = wave.open("./mix.wav", 'w')
mix_wav.setparams(wavs[0].getparams())
# before saving, we want to convert back to '<i2' bytes:
mix_wav.writeframes(mix.astype('<i2').tobytes())
mix_wav.close()
Jus
  • 504
  • 3
  • 11
  • 1
    A great code! It is better to divide the result in mix line: ```mix = (samples[0][:n] + samples[1][:n])/2``` – mohammad Apr 30 '23 at 15:11
4

this is very dependent of the format these are in. Here's an example of how to do it assuming 2 byte wide, little-endian samples:

import wave

w1 = wave.open("/path/to/wav/1")
w2 = wave.open("/path/to/wav/2")

#get samples formatted as a string.
samples1 = w1.readframes(w1.getnframes())
samples2 = w2.readframes(w2.getnframes())

#takes every 2 bytes and groups them together as 1 sample. ("123456" -> ["12", "34", "56"])
samples1 = [samples1[i:i+2] for i in xrange(0, len(samples1), 2)]
samples2 = [samples2[i:i+2] for i in xrange(0, len(samples2), 2)]

#convert samples from strings to ints
def bin_to_int(bin):
    as_int = 0
    for char in bin[::-1]: #iterate over each char in reverse (because little-endian)
        #get the integer value of char and assign to the lowest byte of as_int, shifting the rest up
        as_int <<= 8
        as_int += ord(char) 
    return as_int

samples1 = [bin_to_int(s) for s in samples1] #['\x04\x08'] -> [0x0804]
samples2 = [bin_to_int(s) for s in samples2]

#average the samples:
samples_avg = [(s1+s2)/2 for (s1, s2) in zip(samples1, samples2)]

And now all that's left to do is convert samples_avg back to a binary string and write that to a file using wave.writeframes. That's just the inverse of what we just did, so it shouldn't be too hard to figure out. For your int_to_bin function, you'll probably what to make use of the function chr(code), which returns the character with the character code of code (opposite of ord)

Ponkadoodle
  • 5,777
  • 5
  • 38
  • 62
0

Try the Echo Nest Remix API:

from echonest import audio
from util import *

def mixSound(fname1,fname2,f_out_name):

  f1 = audio.AudioData(fnem1)
  f2 = audio.AudioData(fnem2)


  f_out = audio.mix(f1,f2)
  f_out.encode(foutnem, True)

If it complains about codecs, check https://superuser.com/questions/196857/how-to-install-libmp3lame-for-ffmpeg.

Community
  • 1
  • 1
TTT
  • 6,505
  • 10
  • 56
  • 82