0

I have made a Python app that runs on NAO Robot functioned as a "friend" with the help of Watson Speech To Text and Watson Conversation services. The Robot will alternate between asking the question and answering a question.

When the robot is in "asking question" mode, it will listen to Human and stream the speech to Watson STT. The speech is recorded using Arecord. Arecord is then stopped whenever user finishes talking. The transcribed speech will then be sent to Conversation and Robot will then answer the question accordingly. Usually, the "answering question" mode will last less than 30 seconds. But some reply is long enough that the Watson STT will trigger session timeout.

To prevent this session timeout, we used to send "no-op" message signal every 10 seconds.

def keepAlive(self):
    self.send({"action": "no-op"})

However, Watson recently deprecated the sending "no-op" message to prevent session timeout. As an alternative to sending "no-op", you can send a silence audio data to prevent session timeout. According to the documentation:

"Sending audio data, including silence, to the service to avoid the 30-second >session timeout. You will be charged for any data sent to the service, >including the silence that you send to extend the session."

So I tried this:

def keepAlive(self):
    data = {"action": "start", "content-type": "audio/l16;rate=16000", "inactivity_timeout":-1}
    self.send(json.dumps(data).encode('utf8'), binary)
    reccmd = ["arecord", "-f", "S16_LE", "-r", "16000", "-t", "raw"]
    p = subprocess.Popen(reccmd, stdout=subprocess.PIPE)
    data = p.stdout.read(1024)
    self.send(bytearray(data), binary=True)
    self.send({"action": "stop"})

However, I received this error:

Msg received: {u'error': u'could not detect endianness after looking at the tail 924 non-zero byte string in a data stream of 1024 bytes. Is the bytestream really PCM data?'}

The websocket is also closed immediately after that. What is the correct way to send silence audio data to Watson STT? Or is there any other workaround to prevent session timeout?

akmalmzamri
  • 1,035
  • 1
  • 15
  • 31
  • 1
    It's not the answer to your question, but just a comment: instead of using arecord, you could also subscribe to audio buffer from NAO. As explained there: http://stackoverflow.com/questions/24243757/nao-robot-remote-audio-problems – Alexandre Mazel Apr 10 '17 at 17:03
  • @AlexandreMazel is there any benefit of doing that? – akmalmzamri Apr 10 '17 at 17:19
  • It's easier to manage silence and to decide precisely what and when to send packet to watson. You could also start to analyse on the fly. – Alexandre Mazel Apr 12 '17 at 12:45
  • @AlexandreMazel Thanks for the advice. Will definitely try it later. By the way, what do you mean by "analyse"? And, do you think using the audio buffer will produce better result in term of STT accuracy and speed (as compares to arecord or any 3rd party library)? – akmalmzamri Apr 12 '17 at 12:49
  • when subscribing, you receive buffer of audio chunks every 170ms. So you can analyse to detect if there's voice or ... in a quick manneer... achieving this kind of results: https://youtu.be/_rwMH212dC8 – Alexandre Mazel Apr 13 '17 at 16:21
  • @AlexandreMazel I see. If I were to used NAO built in speech recognition, is there a way for me to train the STT to recognise certain words more than the others? For example: catch > cash, echo > account. I'm thinking of not using Watson STT in the future but to completely rely on NAO STT. – akmalmzamri Apr 16 '17 at 06:05
  • You can start the ASR/STT, then does a simple replace match in string space, eg in python. answer_from_stt.replace( "catch", "cash" ) ... – Alexandre Mazel Apr 17 '17 at 19:05

0 Answers0