4

I ran the sample code in the readme file at tryolabs/TLSphinx README.md, and the result of the text property of the Hypothesis is whitespace, while the score property is a negative number of -4420.

Why am I not getting good results in the text property of the Hypothesis?

Here is my code:

let hmm = localDocumentsURL.path // Path to the acustic model
let lm = localDocumentsURL.appendingPathComponent("6844").appendingPathExtension("lm").path // Path to the languaje model
let dict = localDocumentsURL.appendingPathComponent("cmudict-en-us").appendingPathExtension("dict").path // Path to the languaje dictionary

if let config = Config(args: ("-hmm", hmm), ("-lm", lm), ("-dict", dict)) {
    if let decoder: TLSphinx.Decoder = TLSphinx.Decoder(config:config) {

        let audioFile = Bundle.main.path(forResource: "audio16000", ofType: "wav")! // Path to an audio file

        do {
            try decoder.decodeSpeech(atPath: audioFile) {

                if let hyp: Hypothesis = $0 {
                    // Print the decoder text and score
                    print("Text: \(hyp.text) - Score: \(hyp.score)")
                } else {
                    // Can't decode any speech because of an error
                }
            }
        } catch {
            print(error)
        }
    } else {
        // Handle Decoder() fail
        print("Decoder fail")
    }
} else {
    // Handle Config() fail
    print("Config fail")
}

The debug window had more characters in the text than stackoverflow allowed, so I don't show it.

I am still getting the same result as when I use an mp3 file, except when I used the mp3 file, I got an empty string rather than whitespace. I used Audacity to convert my mp3 file to wav at 16000 Hz sample rate, signed 16 bit PCM format, 16 bit depth, and mono audio channel. Those are the required specifications.

daniel
  • 1,446
  • 3
  • 29
  • 65

2 Answers2

2

Why is the text empty?

You used wrong format of the input file, it should be wav, not mp3

Why is the score so low?

It is not low, it is expected to be negative, since it is logarithm score

How would I fix this so that I get a text result and a high score?

Use proper input format

Nikolay Shmyrev
  • 24,897
  • 5
  • 43
  • 87
  • 1
    I am still getting the same result as before after using a wav file. I used Audacity to convert my mp3 file to wav at 16000 Hz rate, signed 16 bit PCM format, 16 bit depth, and mono audio channel. Those are the required specifications. – daniel Sep 24 '19 at 17:53
1

My fault was to use an incorrect model file. I had the wrong dictionary file. It needs to be "cmudict-en-us.dict" like you did.

Maybe you use the wrong language model file. "6844.lm" doesn't work for me either, but "en-us.lm.dmp" works.

For others asking where to find these files: https://github.com/tryolabs/TLSphinx/tree/master/Sphinx/share/pocketsphinx/model/en-us

chocolate cake
  • 2,129
  • 5
  • 26
  • 48