12

This question follows from OS X Yosemite (10.10) API for continuous speech recognition

OSX now has superb continuous speech recognition. But it doesn't appear to expose any API. I'm building custom HCI kit, and I need to catch this speech input in order to process it.

How to intercept it?

My first thought was that it may create some virtual keyboard device through which it sends key-down/key-up events. If that were the case I could intercept using IOKit, but enumerating my keyboard devices it doesn't appear. So it must be something higher-level.

Please note I'm adding the 'hacking' tag, as it appears that there is no ready-made path -- it is clearly something Apple did not intend to provide.

EDIT:
How to use DictationServices.framework
Can I use OS X 10.8's speech recognition/dictation without a GUI?

Community
  • 1
  • 1
P i
  • 29,020
  • 36
  • 159
  • 267
  • 3
    What are you trying to intercept, exactly? The audio input? The text output? If the latter, can you not read it from the text widget it goes into? – rhashimoto Jun 04 '15 at 17:53

1 Answers1

1

Sadly, NSSpeechRecognizer only listens for an array of commands (I mention that because you brought it up in your linked question). I've looked at a few different ways to capture the input but they're all pretty ghetto.

The most popular way to "intercept" the speech is to trigger the dictation command (fn + fn, unless the user has changed it) and enter the dictated text into a text field. Not exactly elegant, especially for an HCI kit.

If you're feeling frisky you could take a look at the private framework, DictationServices, but all of the standard warnings apply: App Store rejection, "Here be dragons," etc.

Sabrina
  • 617
  • 4
  • 14
  • Stef is right, even if you somehow "hack" OS X speech recognition you will probably have issues with the App store etc. Why not use an open-source framework to do this? For instance: http://cmusphinx.sourceforge.net/ – Tom Jun 11 '15 at 07:11
  • CMUSphinx appears to be the underlying base of all commercial engines. In fact its the base of Nuance's Technology, and IIRC Apple license this. But the original will surely be significantly behind it's commercial offspring -- lack of training data for one. – P i Jun 11 '15 at 10:28
  • I'm not daunted by AppStore rejection as I am looking for a solution for my own use. So maybe `DictationServices` is where I should look to the next. It may be possible to superimpose an invisible overlay window that intercepts and reemits speech data. – P i Jun 11 '15 at 10:30