10

Is there anyway to encorporate Dragon NaturallySpeaking into an event driven program? My boss would really like it if I used DNS to record user voice input without writing it to the screen and saving it directly to XML. I've been doing research for several days now and I can not see a way for this to happen without the (really expensive) SDK, I don't even know that it would work then.

Microsoft has the ability to write a (Python) program where it's speech recognizer can wait until it detects a speech event and then process it. It also has the handy quality of being able to suggest alternative phrases to the one that it thinks is the best guess and recording the .wav file for later use. Sample code:

spEngine = MsSpeech()
spEngine.setEventHandler(RecoEventHandler(spEngine.context))

class RecoEventHandler(SpRecoContext):
def OnRecognition(self, StreamNumber, StreamPosition, RecognitionType, Result):
    res = win32com.client.Dispatch(Result)
    phrase = res.PhraseInfo.GetText()
    #from here I would save it as XML

    # write reco phrases
    altPhrases = reco.Alternates(NBEST)
    for phrase in altPhrases:
        nodePhrase = self.doc.createElement(TAG_PHRASE)

I can not seem to make DNS do this. The closest I can do-hickey it to is:

while keepGoing == True:
    yourWords = raw_input("Your input: ")
    transcript_el = createTranscript(doc, "user", yourWords)
    speech_el.appendChild(transcript_el)
    if yourWords == 'bye':
        break

It even has the horrible side effect of making the user say "new-line" after every sentence! Not the preferred solution at all! Is there anyway to make DNS do what Microsoft Speech does?

FYI: I know the logical solution would be to simply switch to Microsoft Speech but let's assume, just for grins and giggles, that that is not an option.

UPDATE - Has anyone bought the SDK? Did you find it useful?

pnuts
  • 58,317
  • 11
  • 87
  • 139
Danni
  • 315
  • 5
  • 13
  • 3
    @WarrenP: This guy uses it for 40%-60% of his development. It's true that out of the box it's not useful, but using Natlink and VI/Emacs he's got a pretty sweet setup. http://www.youtube.com/watch?v=8SkdfdXWYaI – Jabavu Adams Sep 05 '13 at 15:44
  • @WarrenP Have you even tried it? I use it all the time and it's much faster than using the keyboard (even though I use keyboards since I'm 6 meaning I have a pretty high WPM). For programming I agree that it needs improvement, but it's still useful. See [How can we use Dragon NaturallySpeaking to code more efficiently?](http://productivity.stackexchange.com/q/3605/2476) – Franck Dernoncourt Apr 14 '14 at 15:46
  • I can type at 120 WPM. I have never seen ENGLISH text to speech users hit 40 WPM. Define HIGH wpm? – Warren P Apr 15 '14 at 20:06
  • 1
    @WarrenP if you have RSI, you type only at 0WPM – sam boosalis Jul 26 '14 at 18:26

1 Answers1

8

Solution: download Natlink - http://qh.antenna.nl/unimacro/installation/installation.html It's not quite as flexible to use as SAPI but it covers the basics and I got almost everything that I needed out of it. Also, heads up, it and Python need to be downloaded for all users on your machine or it won't work properly and it works for every version of Python BUT 2.4.

Documentation for all supported commands is found under C:\NatLink\NatLink\MiscScripts\natlink.txt after you download it. It's under all the updates at the top of the file.

Example code:

#make sure DNS is running before you start
if not natlink.isNatSpeakRunning():
  raiseError('must start up Dragon NaturallySpeaking first!')
  shutdownServer()
  return
#connect to natlink and load the grammer it's supposed to recognize
natlink.natConnect()
loggerGrammar = LoggerGrammar()
loggerGrammar.initialize()
if natlink.getMicState() == 'off':
   natlink.setMicState('on')
userName = 'Danni'
natlink.openUser(userName)
#natlink.waitForSpeech() continuous loop waiting for input. 
#Results are sent to gotResultsObject method of the logger grammar
natlink.waitForSpeech()
natlink.natDisconnect()

The code's severely abbreviated from my production version but I hope you get the idea. Only problem now is that I still have to returned to the mini-window natlink.waitForSpeech() creates to click 'close' before I can exit the program safely. A way to signal the window to close from python without using the timeout parameter would be fantastic.

Danni
  • 315
  • 5
  • 13