1

The offline recognition part in TensorflowJS readme mentions that we need to "obtain the spectrogram of an audio snippet through a certain means, e.g., by loading the data from a .wav file or synthesizing the spectrogram programmatically".

Can anyone please explain how to obtain spectrogram from wav file in javascript? I am unable to find a way.

To further explain, I'll show what I did and what the problem is:

let buffer = fs.readFilSync('zero1.wav');
let input = wav.decode(buffer);

# To make size of input equal to 9976 as per the restrictions of TensorflowJS i.e. 
# (1,43,232): 1*43*232 = 9976 in size
input = input['channelData'][0].slice(1000, 10976)
const x = tf.tensor(inp, [1].concat(recognizer.modelInputShape().slice(1)));
const output = await recognizer.recognize(x);

When using the above (And note that zero1.wav is a file obtained from the training data, so should give high accuracy output), I am getting the following ambiguous output -

enter image description here

This only means that the input to the recognizer.recognize() is incorrect.

So, how should I convert my wav file to spectrogram and input it to recognizer.recognize() ?

Please let me know if any clarification is required. Any help is appreciated

  • 1
    Does this answer your question? [How to convert wav file to spectrogram for tensorflowjs with columnTruncateLength: 232 and numFramesPerSpectrogram: 43?](https://stackoverflow.com/questions/58109632/how-to-convert-wav-file-to-spectrogram-for-tensorflowjs-with-columntruncatelengt) – edkeveked Aug 19 '20 at 09:49
  • Thanks for this. But I had already taken a look at this and it does not. But it is the exactly same question. I will further explain my question @edkeveked – ARPIT PRASHANT BAHETY Aug 19 '20 at 10:01
  • @edkeveked I have edited the question. Please take a look and any help is appreciated! – ARPIT PRASHANT BAHETY Aug 19 '20 at 10:21
  • What is `wav.decode(buffer)` returning and why do you have to slice and concatenate to the `input['channelData']`? If `zero1.wav` is obtained from the training data, there should no need to slice it unless the same processing is applied before training on the data. – edkeveked Aug 19 '20 at 10:41
  • As to "there should no need to slice it unless the same processing is applied before training on the data" - I don't understand what kind of processing was applied to training data. Hence, I am unable to pre-process my inference data. Request your help with it! – ARPIT PRASHANT BAHETY Aug 19 '20 at 10:57
  • Why are you slicing `1000, 10976` ? Is there some reasons for that range ? The array sliced will have the length of 976. How do you make sure that its size is 9976 instead ? – edkeveked Aug 19 '20 at 11:19
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/220085/discussion-between-arpit-prashant-bahety-and-edkeveked). – ARPIT PRASHANT BAHETY Aug 19 '20 at 11:26
  • Output of wav.decode(buffer) is an **object** that looks like this: [![enter image description here](https://i.stack.imgur.com/7xjs7.png)](https://i.stack.imgur.com/7xjs7.png) Moreover, I need to slice input['channelData'] because, otherwise, I get the following error: [![enter image description here](https://i.stack.imgur.com/dODJ8.png)](https://i.stack.imgur.com/dODJ8.png) – ARPIT PRASHANT BAHETY Aug 19 '20 at 10:56

1 Answers1

0

https://js.tensorflow.org/api/latest/#signal.stft You can use this tensorflow method to compute the spectrogram. You'll need to do little bit of math to compute the parameters, taking a reference from the parameters of speech_command.recognizer.