1

Recently I have tried to work on Music Transcription, and use the Shuffle-Exchange Networks approach(https://github.com/LUMII-Syslab/RSE), after running the visualizer.py it will produce the visualization.npy file, and I wonder how can I turn the NumPy array into a MIDI file.

If I understand correctly, the following pictures are the Ground truth and prediction, with shape (128, N) now I would like to convert the prediction one into a MIDI file, I tried the solution from the other discussion (How to read a MP3 audio file into a numpy array / save a numpy array to MP3?), but just get the empty MIDI file.

def arry2mid(ary, tempo=500000):
    # get the difference
    new_ary = np.concatenate([np.array([[0] * 128]), np.array(ary)], axis=0) #88
    changes = new_ary[1:] - new_ary[:-1]
    # create a midi file with an empty track
    mid_new = mido.MidiFile()
    track = mido.MidiTrack()
    mid_new.tracks.append(track)
    track.append(mido.MetaMessage('set_tempo', tempo=tempo, time=0))
    # add difference in the empty track
    last_time = 0
    for ch in changes:
        if set(ch) == {0}:  # no change
            last_time += 1
        else:
            on_notes = np.where(ch > 0)[0]
            on_notes_vol = ch[on_notes]
            off_notes = np.where(ch < 0)[0]
            first_ = True
            for n, v in zip(on_notes, on_notes_vol):
                new_time = last_time if first_ else 0
                track.append(mido.Message('note_on', note=n + 21, velocity=v, time=new_time))
                first_ = False
            for n in off_notes:
                new_time = last_time if first_ else 0
                track.append(mido.Message('note_off', note=n + 21, velocity=0, time=new_time))
                first_ = False
            last_time = 0
    return mid_new

prepare_for_midi = vls[:, :, 0] #pick predict piano-roll
prepare_for_midi = np.transpose(prepare_for_midi, [1, 0]) #turn to shape[128, N]
print("prepare_for_midi shape: ", prepare_for_midi.shape)
print("prepare_for_midi head:", prepare_for_midi[:3])
#prepare_for_midi = [np.round(element) for element in prepare_for_midi] # float turn to int
prepare_for_midi = 128 * prepare_for_midi
prepare_for_midi = prepare_for_midi.astype(int)
print("prepare_for_midi shape: ", prepare_for_midi.shape)
print("prepare_for_midi head:", prepare_for_midi[:3])
mid_new = arry2mid(prepare_for_midi)
mid_new.save('mid_new.mid')

Ground truth: Ground truth Prediction: Prediction

Moreover, I would like to ask how to inference another music file by this repository because the training dataset is originally in npz format(I have downloaded it from https://www.kaggle.com/imsparsh/musicnet-dataset?select=musicnet.npz), so I think mabey can try to convert the music file format (wav, mp3...etc.) into npy format, but I didn't sure if using audio2numpy package to convert can produce the same format as the npy file in the musicnet.npz.

If I have an unclear part please let me know, I will add more explanation on it.

starlitsky
  • 41
  • 2

1 Answers1

1

The RSE repository is now updated to include a file transcribe.py which can convert a NumPy array with the predictions to a MIDI file. If the model has been trained, a wav file can be transcribed to MIDI by placing the file in the musicnet_data directory and running python3 transcribe.py yourfile.wav.