Acurately mixing two notes over each other

Question

I have a large library of many pre-recorded music notes (some ~1200), which are all of consistant amplitude.

I'm researching methods of layering two notes over each other so that it sounds like a chord where both notes are played at the same time.

Samples with different attack times:

As you can see, these samples have different peak amplitude points, which need to line up in order to sound like a human played chord.

Manually aligned attack points:

The 2nd image shows the attack points manually alligned by ear, but this is a unfeasable method for such a large data set where I wish to create many permutations of chord samples.

I'm considering a method whereby I identify the time of peak amplitude of two audio samples, and then align those two peak amplitude times when mixing the notes to create the chord. But I am unsure of how to go about such an implementation.

I'm thinking of using python mixing solution such as the one found here Mixing two audio files together with python with some tweaking to mix audio samples over each other.

I'm looking for ideas on how I can identify the times of peak amplitude in my audio samples, or if you have any thoughts on other ways this idea could be implemented I'd be very interested.

if for each pair of audio input files you had a sample good output file you could possibly train your system to auto adjust how it chooses to combine the pair of input files to best match the output file then leverage that training to perform similar combining without needing the known good output file ... ML baby ! — Scott Stensland, Apr 17 '18 at 12:16
This project is actually going to be used with machine learning, building an FFT database of chords to run through tensorflow. Unfortunately if I had all the good chords already at my disposal I wouldn't need to create these note pairings. I'm looking at creating many permutations of chords in order to train tensorflow. — m1st3rnutso, Apr 19 '18 at 05:49
I think this link involves every answer related to `Audio_Processing` whether it is *Pre-Processing* or *Post-Processing*: [Android_Audio_Processing_Using_WebRTC](https://github.com/mail2chromium/Android-Audio-Processing-Using-WebRTC), You can also visit this reference: https://stackoverflow.com/a/58546599/10413749 — Muhammad Usman Bashir, Apr 07 '20 at 08:47

score 2 · Accepted Answer · answered Apr 20 '18 at 03:25

Incase anyone were actually interested in this question, I have found a solution to my problem. It's a little convoluded, but it has yeilded excellent results.

To find the time of peak amplitude of a sample, I found this thread here: Finding the 'volume' of a .wav at a given time where the top answer provided links to a scala library called AudioFile, which provided a method to find the peak amplite by going through a sample in frame buffer windows. However this library required all files to be in .aiff format, so a second library of samples was created consisting of all the old .wav samples converted to .aiff.

After reducing the frame buffer window, I was able to determine in which frame the highest amplitude was found. Dividing this frame by the sample rate of the audio samples (which was known to be 48000), I was able to accurately find the time of peak amplitude. This information was used to create a file which stored both the name of the sample file, along with its time of peak amplitude.

Once this was accomplished, a python script was written using the Pydub library http://pydub.com/ which would pair up two samples, and find the difference (t) in their times of peak amplitudes. The sample with the lowest time of peak amplitude would have silence of length (t) preappended to it from a .wav containing only silence.

These two samples were then overlayed onto each other to produce the accurately mixed chord!

nice ... an alternative to just picking sample with greatest amplitude, which may be vulnerable to spurious spikes outside the indented max amplitude point, would be to slide across the audio file a window of a few samples where you feed into a RMS calculation that set of samples to identify a power metric — Scott Stensland, Apr 20 '18 at 13:44

Acurately mixing two notes over each other

1 Answers1