In trying to get Speech to Text (IBM Voice Gateway IVR app) to recognize alpha-numeric character strings, I am wondering if I could create a custom grammar or entity that would restrict STT to recognizing just individual letters and numbers, excluding words altogether. For example, here's a typical string: 20Y0H8C. Watson comes back with words and numbers, like "two" instead of "2". Digit strings work fine. I realize that letter recognition is problematic with typical ASR, but I'm hoping Watson is up to the task. I noticed there are no system entities for alphanumeric characters. Any suggestions are much appreciated.
1 Answers
In this case, set smart_formatting
to true
.
The smart_formatting parameter converts dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses into more conventional representations in the final transcript of a recognition request. The conversion makes the transcript more readable and enables better post-processing of the transcription results. You set the parameter to true to enable smart formatting, as in the following example; by default, the parameter is false and smart formatting is not performed.
Check:
curl -X POST -u {username}:{password}
--header "Content-Type: audio/flac"
--data-binary @{path}audio-file.flac
"https://stream.watsonplatform.net/speech-to-text/api/v1/recognize?smart_formatting=true"
Result:
Voice: The quantity is one million one hundred and one
Result: The quantity is 1000101
Check IBM Official documentation.
Note: The smart formatting feature is currently beta functionality that is available for US English only.

- 1
- 1

- 5,250
- 3
- 26
- 53
-
Thanks for your answer, but the issue is when letters are spoken in the string. Smart_formatting is already enabled, but there's nothing for alphanumeric strings. I've also tried using input.text.match("^[a-zA-Z0-9]*$"), which works while using a chat window, but is hit or miss using STT. The goal is to get watson to accept only alphanumeric strings, thus really constricting the scope. The data is fixed-length strings (7 chars) and the letters can be anywhere. For example: HV00310. – Wilson the Dog Jun 14 '17 at 19:45
-
I should note that I'm using IBM Voice Gateway (STT is narrowband). – Wilson the Dog Jun 14 '17 at 20:02