0

I'm using Ratchet to connect to IBM Watson websockets, and it always seems to work fine for smaller files (I've tested up to a 66-minute 23 MB mp3 file), but it always fails for larger files (such as 2-hour 56 MB mp3).

This is my log:

[2019-03-17 21:43:23] local.DEBUG: \Ratchet\Client\connect bf4e38983775f6e53b392666138b5a3a50e9c9c8  
[2019-03-17 21:43:24] local.DEBUG: startWatsonStream options = {"content-type":"audio\/mpeg","timestamps":true,"speaker_labels":true,"smart_formatting":true,"inactivity_timeout":-1,"interim_results":false,"max_alternatives":1,"word_confidence":false,"action":"start"}  
[2019-03-17 21:43:24] local.DEBUG: Split audio into this many frames: 570222  
[2019-03-17 21:43:42] local.DEBUG: send action stop  
[2019-03-17 21:43:42] local.DEBUG: Received: {
   "state": "listening"
}  
[2019-03-17 21:43:42] local.DEBUG: Received first 'listening' message.  
[2019-03-17 22:56:31] local.DEBUG: Connection closed (1006 - Underlying connection closed)  

Notice the 1h13m between receiving the first 'listening' message and then having the connection close with an error.

Watson says: "1006 indicates that the connection closed abnormally."

https://www.rfc-editor.org/rfc/rfc6455 says:

1006 is a reserved value and MUST NOT be set as a status code in a Close control frame by an endpoint. It is designated for use in applications expecting a status code to indicate that the connection was closed abnormally, e.g., without sending or receiving a Close control frame.

What part of my code could I adjust so that it can handle longer mp3 files without throwing a 1006 error?

\Ratchet\Client\connect($url, [], $headers)->then(function(\Ratchet\Client\WebSocket $conn) use($contentType, $audioFileContents, $callback) {
    $conn->on('message', function($msg) use ($conn, $callback) {
        $this->handleIncomingWebSocketMessage($msg, $conn, $callback);
    });
    $conn->on('close', function($code = null, $reason = null) {
        Log::debug("Connection closed ({$code} - {$reason})");
    });
    $this->startWatsonStream($conn, $contentType);
    $this->sendBinaryMessage($conn, $audioFileContents); 
    Log::debug('send action stop');
    $conn->send(json_encode(['action' => 'stop']));
}, function (\Exception $e) {
    Log::error("Could not connect: {$e->getMessage()} " . $e->getTraceAsString());
});

...

public function handleIncomingWebSocketMessage($msg, $conn, $callback) {
    Log::debug("Received: " . str_limit($msg, 100));
    $msgArray = json_decode($msg, true);
    $state = $msgArray['state'] ?? null;
    if ($state == 'listening') {
        if ($this->listening) {//then this is the 2nd time listening, which means audio processing has finished and has already been sent by server and received by this client.
            Log::debug("FINAL RESPONSE: " . str_limit($this->responseJson, 500));
            $conn->close(\Ratchet\RFC6455\Messaging\Frame::CLOSE_NORMAL, 'Finished.'); 
            $callback($this->responseJson);
        } else {
            $this->listening = true;
            Log::debug("Received first 'listening' message.");
        }
    } else {
        $this->responseJson = $msg;
    }
}

public function sendBinaryMessage($conn, $fileContents) {
    $chunkSizeInBytes = 100; //probably just needs to be <= 4 MB according to Watson's rules
    $chunks = str_split($fileContents, $chunkSizeInBytes);
    Log::debug('Split audio into this many frames: ' . count($chunks));
    $final = true;
    foreach ($chunks as $key => $chunk) {
        $frame = new \Ratchet\RFC6455\Messaging\Frame($chunk, $final, \Ratchet\RFC6455\Messaging\Frame::OP_BINARY);
        $conn->send($frame);
    }

}
Community
  • 1
  • 1
Ryan
  • 22,332
  • 31
  • 176
  • 357
  • I would guess that on a `$conn->on('close',` you would want to re-establish the connection. – chughts Mar 19 '19 at 11:58
  • @chughts I'd be curious why smaller files work, then. And my understanding was that it's possible (or maybe even required?) for the session to stay open the entire time until all processing has finished and the entire response has been received by my client. E.g. "You do not need to worry about the session timeout after you send the last chunk to indicate the end of the stream. The service continues to process the audio until it returns the final transcription results. When you transcribe a long audio stream, ... – Ryan Mar 19 '19 at 13:56
  • @chughts "...the service can take more than 30 seconds to process the audio and generate a response. The service does not begin to calculate the session timeout until it finishes processing all audio that it has received. The service's processing time cannot cause the session to exceed the 30-second session timeout." - https://cloud.ibm.com/docs/services/speech-to-text?topic=speech-to-text-input#timeouts – Ryan Mar 19 '19 at 13:57
  • I am guessing that is is not the service that is closing the web socket but the underlying web socket infrastructure that is auto-closing it. – chughts Mar 19 '19 at 14:03
  • @chughts Ahh, so you mean my PHP server (which is acting as the *client* for the purposes of websockets) using the Ratchet library. https://stackoverflow.com/a/19305172/470749 similarly says that 1006 is an error caused by the client rather than the websockets server. I'm not yet sure what to look into, but hearing that it's likely something on *my* end (and in my control) rather than IBM's end is encouraging. I guess at a minimum I could try your idea of reconnecting. Thanks. – Ryan Mar 19 '19 at 14:38

1 Answers1

1

As a general recommendation, file based recognition, and especially if the files are bigger than a few MBs, should be done using the Watson /recognitions API (more details here: https://cloud.ibm.com/apidocs/speech-to-text), which is asynchronous. You do not need to keep a connection open for a few hours, that is not a good practice since you could run into a read timeout, you could lose network connectivity, etc. By doing it asynchronously you POST the file and then the connection ends, then you can GET status every X minutes, or be notified via callback, whatever works better for you.

curl -X POST -u "apikey:{apikey}" --header "Content-Type: audio/flac" --data-binary @audio-file.flac "https://stream.watsonplatform.net/speech-to-text/api/v1/recognitions?callback_url=http://{user_callback_path}/job_results&user_token=job25&timestamps=true"

btw. is your websockets client using ping-pong frames to keep connections alive? I noticed that you do not request interim results ({"content-type":"audio\/mpeg","timestamps":true,"speaker_labels":true,"smart_formatting":true,"inactivity_timeout":-1,"interim_results":false,"max_alternatives":1,"word_confidence":false,"action":"start"}), that is another way to keep a connection open, but less reliable. Please check the ping pong frames.

Daniel Bolanos
  • 770
  • 3
  • 6
  • Thanks for your interesting suggestions. +1. I don't know if I've been doing it correctly, but yes, I've been sending a ping every 120 seconds: https://stackoverflow.com/questions/35009726/connection-drop-with-ibm-watson-server?noredirect=1&lq=1#comment97386282_35411963 – Ryan Mar 30 '19 at 13:55