4

I want to use Google speech-to-text using Kinesis stream as part of voicebot service using Amazon Connect, Amazon Lex and Amazon SQS (I used code from https://cloud.google.com/speech-to-text/docs/streaming-recognize#speech-streaming-mic-recognize-java and I changes reference type from AudioInputStream to InputStream).

I use Amazon Transcribe speech-to-text service but I want to replace it with Google as Google supports more languages. However, Google Speech can't accept InputStream object created by Amazon SDK.

I use the code below. Instead of changing AudioInputStream to InputStream, I also tried getAudioInputStream() method (also with creating BufferedInputStream).

String streamName = streamARN.substring(streamARN.indexOf("/") + 1, streamARN.lastIndexOf("/"));
InputStream kvsInputStream = KVSUtils.getInputStreamFromKVS(streamName, REGION, startFragmentNum, getAWSCredentials());
//InputStream bufferedIn = new BufferedInputStream(kvsInputStream); //to solve 'no reset/mark support in stream' error
//AudioInputStream audioStream = AudioSystem.getAudioInputStream(bufferedIn); //to solve 'no reset/mark support in stream' error
streamingMicRecognize(kvsInputStream);

In current state I get the error

com.google.api.gax.rpc.CancelledException: io.grpc.StatusRuntimeException: CANCELLED: The operation was cancelled.

When I used two commented lines (I found this solution on SO), the error was

java.lang.ClassCastException: com.amazonaws.util.ServiceClientHolderInputStream cannot be cast to javax.sound.sampled.AudioInputStream

Can you please suggest any solution? For English voicebot Connect offers a special block which lets me connect phone call voice with Lex, but Lex supports only US English and I need other languages as well. I know Google Dialogflow ("Google's Lex") can process many languages and offers integration with phone gateway, but the phone gateway supports English only (which is ridiculous). Thanks in advance.


UPDATE I solved this problem with the following code:

    InputStream kvsInputStream = KVSUtils.getInputStreamFromKVS(input.getStreamName(), Regions.US_EAST_1, input.getStartFragmentNum(), new SystemPropertiesCredentialsProvider());
    StreamingMkvReader streamingMkvReader = StreamingMkvReader.createDefault(new InputStreamParserByteSource(kvsInputStream));
    FragmentMetadataVisitor.BasicMkvTagProcessor tagProcessor = new FragmentMetadataVisitor.BasicMkvTagProcessor();
    FragmentMetadataVisitor fragmentVisitor = FragmentMetadataVisitor.create(Optional.of(tagProcessor));
    ByteBuffer audioBuffer = KVSUtils.getByteBufferFromStream(streamingMkvReader, fragmentVisitor, tagProcessor, input.getConnectContactId());
    SpeechClient client = SpeechClient.create();
    clientStream = client.streamingRecognizeCallable().splitCall(responseObserver); //responseObserver is an instance of ResponseObserver<StreamingRecognizeResponse>() with onXXX methods defined by user -- see Google Speech-To-Text examples
    byte[] audioBytes;
    int counter = 0;
    do {
        audioBytes = new byte[audioBuffer.remaining()];
        audioBuffer.get(audioBytes);
        request = StreamingRecognizeRequest.newBuilder()
                            .setAudioContent(ByteString.copyFrom(audioBytes))
                            .build();
        clientStream.send(request);
        audioBuffer = KVSUtils.getByteBufferFromStream(streamingMkvReader, fragmentVisitor, tagProcessor, input.getConnectContactId());
                        counter++;
    } while (audioBytes.length > 0);

The solution is to go to the lowest possible level in stream handling, i.e. to get simply a byte array instead of stream objects

PeterAI
  • 41
  • 3
  • Hi there, did you found the solution? I'm working on similar stuff. It would be great if you could share the code of the solution if you managed to figure it out. Thanks. – limcheekin Feb 04 '20 at 02:47
  • @limcheekin, please see my updated questions. The solution is to extract the byte array with stream instead of using any stream-wrapping object – PeterAI Feb 06 '20 at 09:43
  • Great! Thanks for sharing the solution :) – limcheekin Feb 08 '20 at 02:32

0 Answers0