How to train on and make a serialized feature vector for a Neural Network?

Question

By serialized i mean that the values for an input come in discrete intervals of time and that size of the vector is also not known before hand. Conventionally the neural networks employ fixed size parallel input neurons and fixed size parallel output neurons.

A serialized implementation could be used in speech recognition where i can feed the network with a time series of the waveform and on the output end get the phonemes.

It would be great if someone can point out some existing implementation.

score 3 · Accepted Answer · edited May 23 '17 at 11:59

Simple neural network as a structure doesn't have invariance across time scale deformation that's why it is impractical to apply it to recognize time series. To recognize time series usually a generic communication model is used (HMM). NN could be used together with HMM to classify individual frames of speech. In such HMM-ANN configuration audio is split on frames, frame slices are passed into ANN in order to calculate phoneme probabilities and then the whole probability sequence is analyzed for a best match using dynamic search with HMM.

HMM-ANN system usually requires initialization from more robust HMM-GMM system thus there are no standalone HMM-ANN implementation, usually they are part of a whole speech recognition toolkit. Among popular toolkits Kaldi has implementation for HMM-ANN and even for HMM-DNN (deep neural networks).

There are also neural networks which are designed to classify time series - recurrent neural networks, they can be successfully used to classify speech. The example can be created with any toolkit supporting RNN, for example Keras. If you want to start with recurrent neural networks, try long-short term memory networks (LSTM), their architecture enables more stable training. Keras setup for speech recognition is discussed in Building Speech Dataset for LSTM binary classification

score 3 · Answer 2 · answered Oct 18 '13 at 05:07

There are several types of neural networks that are intended to model sequence data; I would say most of these models fit into an equivalence class known as a recurrent neural network, which is generally any neural network model whose connection graph contains a cycle. The cycle in the connection graph can typically be exploited to model some aspect of the past "state" of the network, and different strategies -- for example, Elman/Jordan nets, Echo State Networks, etc. -- have been developed to take advantage of this state information in different ways.

Historically, recurrent nets have been extremely difficult to train effectively. Thanks to lots of recent work in second-order optimization tools for neural networks, along with research from the deep neural networks community, several recent examples of recurrent networks have been developed that show promise in modeling real-world tasks. In my opinion, one of the neatest current examples of such a network is Ilya Sutskever's "Generating text with recurrent neural networks" (ICML 2011), in which a recurrent net is used as a very compact, long-range n-gram character model. (Try the RNN demo on the linked homepage, it's fun.)

As far as I know, recurrent nets have not yet been applied successfully to speech -> phoneme modeling directly, but Alex Graves specifically mentions this task in several of his recent papers. (Actually, it looks like he has a 2013 ICASSP paper on this topic.)

There was some success in the 90's applying RNNs to speech recognition. http://www.cstr.ed.ac.uk/downloads/publications/1996/rnn4csr96.pdf — Paul Dixon, Nov 28 '13 at 05:12

How to train on and make a serialized feature vector for a Neural Network?

2 Answers2

Linked