Recently I have tried to work on Music Transcription, and use the Shuffle-Exchange Networks approach(https://github.com/LUMII-Syslab/RSE), after running the visualizer.py
it will produce the visualization.npy
file, and I wonder how can I turn the NumPy array into a MIDI file.
If I understand correctly, the following pictures are the Ground truth and prediction, with shape (128, N) now I would like to convert the prediction one into a MIDI file, I tried the solution from the other discussion (How to read a MP3 audio file into a numpy array / save a numpy array to MP3?), but just get the empty MIDI file.
def arry2mid(ary, tempo=500000):
# get the difference
new_ary = np.concatenate([np.array([[0] * 128]), np.array(ary)], axis=0) #88
changes = new_ary[1:] - new_ary[:-1]
# create a midi file with an empty track
mid_new = mido.MidiFile()
track = mido.MidiTrack()
mid_new.tracks.append(track)
track.append(mido.MetaMessage('set_tempo', tempo=tempo, time=0))
# add difference in the empty track
last_time = 0
for ch in changes:
if set(ch) == {0}: # no change
last_time += 1
else:
on_notes = np.where(ch > 0)[0]
on_notes_vol = ch[on_notes]
off_notes = np.where(ch < 0)[0]
first_ = True
for n, v in zip(on_notes, on_notes_vol):
new_time = last_time if first_ else 0
track.append(mido.Message('note_on', note=n + 21, velocity=v, time=new_time))
first_ = False
for n in off_notes:
new_time = last_time if first_ else 0
track.append(mido.Message('note_off', note=n + 21, velocity=0, time=new_time))
first_ = False
last_time = 0
return mid_new
prepare_for_midi = vls[:, :, 0] #pick predict piano-roll
prepare_for_midi = np.transpose(prepare_for_midi, [1, 0]) #turn to shape[128, N]
print("prepare_for_midi shape: ", prepare_for_midi.shape)
print("prepare_for_midi head:", prepare_for_midi[:3])
#prepare_for_midi = [np.round(element) for element in prepare_for_midi] # float turn to int
prepare_for_midi = 128 * prepare_for_midi
prepare_for_midi = prepare_for_midi.astype(int)
print("prepare_for_midi shape: ", prepare_for_midi.shape)
print("prepare_for_midi head:", prepare_for_midi[:3])
mid_new = arry2mid(prepare_for_midi)
mid_new.save('mid_new.mid')
Moreover, I would like to ask how to inference another music file by this repository because the training dataset is originally in npz format(I have downloaded it from https://www.kaggle.com/imsparsh/musicnet-dataset?select=musicnet.npz), so I think mabey can try to convert the music file format (wav, mp3...etc.) into npy format, but I didn't sure if using audio2numpy
package to convert can produce the same format as the npy file in the musicnet.npz
.
If I have an unclear part please let me know, I will add more explanation on it.