I have a audio data set and each of them has different length. There are some events in these audios, that I want to train and test but these events are placed randomly, plus the lengths are different, it is really hard to build a machine learning system with using that dataset. I thought fixing a default size of length and build a multilayer NN however, the length's of events are also different. Then I thought about using CNN, like it is used to recognise patterns or multiple humans on an image. The problem for that one is I am really struggling when I try to understand the audio file.
So, my questions, Is there anyone who can give me some tips about building a machine learning system that classifies different types of defined events with training itself on a dataset that has these events randomly(1 data contains more than 1 events and they are different from each other.) and each of them has different lenghts?
I will be so appreciated if anyone helps.