I want to use Google speech-to-text using Kinesis stream as part of voicebot service using Amazon Connect, Amazon Lex and Amazon SQS (I used code from https://cloud.google.com/speech-to-text/docs/streaming-recognize#speech-streaming-mic-recognize-java and I changes reference type from AudioInputStream to InputStream).
I use Amazon Transcribe speech-to-text service but I want to replace it with Google as Google supports more languages. However, Google Speech can't accept InputStream object created by Amazon SDK.
I use the code below. Instead of changing AudioInputStream to InputStream, I also tried getAudioInputStream() method (also with creating BufferedInputStream).
String streamName = streamARN.substring(streamARN.indexOf("/") + 1, streamARN.lastIndexOf("/"));
InputStream kvsInputStream = KVSUtils.getInputStreamFromKVS(streamName, REGION, startFragmentNum, getAWSCredentials());
//InputStream bufferedIn = new BufferedInputStream(kvsInputStream); //to solve 'no reset/mark support in stream' error
//AudioInputStream audioStream = AudioSystem.getAudioInputStream(bufferedIn); //to solve 'no reset/mark support in stream' error
streamingMicRecognize(kvsInputStream);
In current state I get the error
com.google.api.gax.rpc.CancelledException: io.grpc.StatusRuntimeException: CANCELLED: The operation was cancelled.
When I used two commented lines (I found this solution on SO), the error was
java.lang.ClassCastException: com.amazonaws.util.ServiceClientHolderInputStream cannot be cast to javax.sound.sampled.AudioInputStream
Can you please suggest any solution? For English voicebot Connect offers a special block which lets me connect phone call voice with Lex, but Lex supports only US English and I need other languages as well. I know Google Dialogflow ("Google's Lex") can process many languages and offers integration with phone gateway, but the phone gateway supports English only (which is ridiculous). Thanks in advance.
UPDATE I solved this problem with the following code:
InputStream kvsInputStream = KVSUtils.getInputStreamFromKVS(input.getStreamName(), Regions.US_EAST_1, input.getStartFragmentNum(), new SystemPropertiesCredentialsProvider());
StreamingMkvReader streamingMkvReader = StreamingMkvReader.createDefault(new InputStreamParserByteSource(kvsInputStream));
FragmentMetadataVisitor.BasicMkvTagProcessor tagProcessor = new FragmentMetadataVisitor.BasicMkvTagProcessor();
FragmentMetadataVisitor fragmentVisitor = FragmentMetadataVisitor.create(Optional.of(tagProcessor));
ByteBuffer audioBuffer = KVSUtils.getByteBufferFromStream(streamingMkvReader, fragmentVisitor, tagProcessor, input.getConnectContactId());
SpeechClient client = SpeechClient.create();
clientStream = client.streamingRecognizeCallable().splitCall(responseObserver); //responseObserver is an instance of ResponseObserver<StreamingRecognizeResponse>() with onXXX methods defined by user -- see Google Speech-To-Text examples
byte[] audioBytes;
int counter = 0;
do {
audioBytes = new byte[audioBuffer.remaining()];
audioBuffer.get(audioBytes);
request = StreamingRecognizeRequest.newBuilder()
.setAudioContent(ByteString.copyFrom(audioBytes))
.build();
clientStream.send(request);
audioBuffer = KVSUtils.getByteBufferFromStream(streamingMkvReader, fragmentVisitor, tagProcessor, input.getConnectContactId());
counter++;
} while (audioBytes.length > 0);
The solution is to go to the lowest possible level in stream handling, i.e. to get simply a byte array instead of stream objects