I want to apply CNN to classify a sound as speech/non-speech and I have computed MFCC features for the same. I looked into some NN libraries like Caffe, but it seems to me that they expect input as images. Can anyone suggest me, how can I use MFCC as input for CNN?
Asked
Active
Viewed 70 times
0
-
sorry mate but this is off-topic and/or too broad – Khalil Khalaf Jun 15 '16 at 13:13
-
caffe is not restricted to "images" input. Try looking into `"HDF5Data"` input layer. – Shai Jun 15 '16 at 13:27
-
It's quite likely too broad, but not off-topic. Our company sells sound classifiers which are a bit more refined (e.g. we can also distinguish aggressive speech) but we're entirely familiar with this area. CNN's in general are not (yet) suited for audio, and an MFCC representation of audio would certainly be unsuitable for CNN. The reason CNN's want images is to have a meaningful convolution operation (that's why it is a **C**NN.) There's no reasonable convolution over an MFCC representation. – MSalters Jun 15 '16 at 13:42
-
@MSalters I also thought of using spectrograms with CNN but as I need to mostly classify non-verbal voices, I am doubtful how well it will work in this particular case. Will still try Shai's suggestion, may be I will get some better results. – CuriousCase Jun 15 '16 at 13:46
-
@neha: Without wanting to be negative, the problem here is a CS/math problem, not a file format problem. Using a "HFD5" file format isn't going to magically give you a convolution, and until you have that convolution you don't have a CNN. – MSalters Jun 15 '16 at 13:58
-
@MSalters, not taken negative, but what I meant was to try MFCC with CNN and not some image to classify. – CuriousCase Jun 15 '16 at 14:01
-
@MSalters I suppose you do not have to be strict about the use of "convolutional" NN, it might be that neha simply confuses it with deep neural net. – Shai Jun 15 '16 at 14:01
-
@neha: Looking at your profile, you seem to be living almost around the corner. Human Language Technology and Pattern Recognition Group? – MSalters Jun 15 '16 at 14:09
-
@MSalters no, a little far. Visual computing group, with virtual reality. – CuriousCase Jun 15 '16 at 15:01