6

There isn't single one working example on whole internet of how to perform FFT analysis of a sound file/buffer/audiobuffer in browser without need for playback. Web audio API has been changed too much to be able to use this library https://github.com/corbanbrook/dsp.js any more for example. All other clues currently don't lead to solution.

EDIT: I don't need to manipulate any of the data, just to read frequency spectrum in different moments in time of the audio. Input of solution can be any form of data(wav file, arraybuffer, audiobuffer, anything) but not stream. Expected output ideally would be array (moments in time) of arrays(freqency bin amplitudes).

okram
  • 810
  • 3
  • 11
  • 17
  • Are you trying to measure (sample) actual audio output or raw data? The question asks for analysis. The file is static; does not change the data without user action. The purpose of analysis would be to sample actual audio output, that is, the audio output is expected to be different, or at least, the sampling record is expected to be unique, or arbitrary for each media playback, correct? – guest271314 Feb 16 '19 at 18:36
  • raw data. i need to avoid output because of faster user experience. – okram Feb 16 '19 at 18:43
  • What is the raw data, a static file not being played back, analyzed for? What do you mean by "without need for playback"? What is the requirement and expected input and output? Are you trying to manipulate a file to filter certain audio output before playing the file or having to play the media back at all to perform the filtering? – guest271314 Feb 16 '19 at 18:45
  • afaik, you are right. it would actually be slower not to use the C-accelerated webaudiio api's tools, trying to mount and analize the waveforms in pure userland js. i dont even know how you would turn the mp3 into a wave. If you did go all js,use webworkers to avoid ui slowdown. – dandavis Feb 16 '19 at 18:58
  • @okram See [TensorFlow](https://github.com/tensorflow/tensorflow); [TensorFlow.js](https://js.tensorflow.org/) – guest271314 Feb 16 '19 at 19:00
  • I dont need to manipulate anything, just to read frequency spectrum in different moments in time of the audio. Input can be any form of data(wav file, arraybuffer, audiobuffer, anything) but not stream. Expected output ideally would be array (moments in time) of arrays(freqency bin amplitudes). – okram Feb 16 '19 at 19:01
  • @okram With the requirement being without playback? If you create an `OfflineAudioContext` one of more `SourceBuffer`s can be created, merged, analyzed. If you have the models, you can compare the resulting `AudioBuffer`, or or `TypedArray` data to the model data. Unless not gathering what the requirement is? – guest271314 Feb 16 '19 at 19:08
  • I don't understand why I would need any models or machine learning for this task. If you can provide working example of those tools(offline context, src buffer analysing) you offer, it would be awesome. I already tried everything I could read from Mozilla website on audio API. – okram Feb 16 '19 at 19:12
  • @okram Still not certain what the expected output is? The closest the can gather as to what you are trying to achieve based on interpretation of the question relevant to what have tried here are [Is it possible to mix multiple audio files on top of each other preferably with javascript](https://stackoverflow.com/q/40570114/); [How to use Blob URL, MediaSource or other methods to play concatenated Blobs of media fragments?](https://stackoverflow.com/q/45217962/). You can implement your own analyzing in the code. Still not sure what you mean by analysis (sampling) without playback. – guest271314 Feb 16 '19 at 19:16
  • @okram [Web audio analyser node - run at regular interval](https://stackoverflow.com/q/43191204/) _"Yeah, you can't really use an analyzer. There's too much uncertainty in when it will get run, and you can't guarantee precisely when it will run. You're better off using a ScriptProcessor for now (AudioWorklet eventually), and doing the FFT (or other recognition code) yourself."_ – guest271314 Feb 16 '19 at 19:18
  • @okram At Chromium/Chrome it is possible ti use Native Messaging to transfer the data to a shell script to process then transfer the output back to the browser `window` [How to programmatically send a unix socket command to a system server autospawned by browser or convert JavaScript to C++ souce code for Chromium?](https://stackoverflow.com/questions/48219981/); [Chrome Native messaging with PHP](https://stackoverflow.com/questions/47269500/). You can implement the shell script in any language that suits meeting the requirement. – guest271314 Feb 16 '19 at 19:21
  • @okram _"why I would need any models or machine learning for this task."_ You can train models to match against raw data _"In April 2017, *oogle published a paper, Tacotron: Towards End-to-End Speech Synthesis, where they present a neural text-to-speech model that learns to synthesize speech directly from (text, audio) pairs."_ [tacotron](https://github.com/keithito/tacotron). E.g., `webkitSpeechRecognition` implementation at Chromium records user audio, sends to remote service, returns transcript. See also [ts-ebml ebml](https://github.com/legokichi/ts-ebml). – guest271314 Feb 16 '19 at 19:38
  • You might also be interested in the source code of [`espeak-ng`](https://en.wikipedia.org/wiki/ESpeakNG). – guest271314 Feb 16 '19 at 19:54

4 Answers4

3

If you must use WebAudio, the way to do it is to use an OfflineAudioContext. Then when you need to get the frequency data, call suspend(time). Something like the following:

c = new OfflineAudioContext(....);
a = new AnalyserNode(c);
src.connect(a);  // src is the signal you want to analyze.

c.suspend(t1)
  .then(() => {
    a.getFloatFrequencyData(array1);
  })
  .then(() => c.resume());

c.suspend(t2)
  .then(() => {
    a.getFloatFrequencyData(array2);
  })
  .then(() => c.resume());

// More suspends if needed

// Render everything now
c.startRendering()
  .then((buffer => {
    // Maybe do something now that all the frequency data is available.
  })

However, I think only Chrome supports suspend with an offline context.

Raymond Toy
  • 5,490
  • 10
  • 13
  • Thank you, interesting to learn about this option. Unfortunately I need to go for cross-browser functionality and Firefox and Safari currently don't support suspend method on offline context according to https://developer.mozilla.org/en-US/docs/Web/API/OfflineAudioContext/suspend – okram Mar 16 '19 at 18:26
  • Yeah, that's too bad. I did file a bug against Firefox about adding this. You could file a bug for Safari. Maybe it will get implemented some day. – Raymond Toy Mar 21 '19 at 02:46
2

You can do a lot with an offline audiocontext, but that will just run the whole nodes-graph as fast as possible to render a resulting chunk of audio. I don't see how an analysernode would even work in such a situation (since its audio output is useless).

Seems to me that you're correct in that you can't use the Web Audio API without actually playing the file in realtime. You would have to do the analysis yourself, there should be a lot of libraries available for that (since it's just numbercrunching). Webworkers or wasm is probably the way to go.

Eindbaas
  • 945
  • 1
  • 8
  • 16
  • I can only find 'thin wrapper for web audio API' libraries. Do you have any personal recommendation for library that does have a documented solution for my problem? – okram Feb 16 '19 at 19:57
  • I don't have experience with what you want to do, but doing frequency analysis is just running through numbers and not very obscure, so there should be tons of existing code in js. you shouldn't be searching for anything related to the web audio api though - we both agreed that it probably wasn't going to work with that :) – Eindbaas Feb 16 '19 at 20:42
1

You need 4 things:

  • Javascript code to read in a WAV file as a binary blob

  • Code to convert slices of that blob as 16-bit samples into suitable Javascript arrays of numeric samples for an FFT

  • A Javascript implementation of a DFT or FFT of suitable size arrays for the time and frequency resolution you desire

  • Code to estimate your desired frequency and magnitude parameters as you step-and-repeat the FFT across your data slices

The first 3 can be found from web searches (Github, here, et.al.)

hotpaw2
  • 70,107
  • 14
  • 90
  • 153
  • It's bullet point 3 that's the problem. Do you actually know of a JS implementation of the FFT that actually works in the browser? Because sure, then it becomes almost trivially easy (heck, use an mp3 file with AudioContext.decodeAudioData() even, since there no need for giant PCM wave files if we're going to use pure JS instead of the audio API anyway) – Mike 'Pomax' Kamermans Apr 10 '20 at 05:17
0

Already exisiting APIs would provide you the heavily processed DFT output. First, AnalyserNode applies the Blackman-Harris window function. Then applies DFT. Then does exponential smoothing where α is smoothingTimeConstant. Then converts it to decibel scale. This way you only get the magnitude, and not the phase (in case you need it).

xsb
  • 11
  • 3