21

I've installed the PocketSphinx demo and it works fine under Ubuntu and Eclipse, but despite trying I can't work out how I would add recognition of multiple words.

All I want is for the code to recognize single words, which I can then switch() within the code, e.g. "up", "down", "left", "right". I don't want to recognize sentences, just single words.

Any help on this would be grateful. I have spotted other users' having similar problems but nobody knows the answer so far.


One thing which is baffling me is why do we need to use the "wakeup" constant at all?

private static final String KWS_SEARCH = "wakeup";
private static final String KEYPHRASE = "oh mighty computer";
.
.
.
recognizer.addKeyphraseSearch(KWS_SEARCH, KEYPHRASE);

What has wakeup got to do with anything?


I have made some progress (?) : Using addGrammarSearch I am able to use a .gram file to list my words, e.g. up,down,left,right,forwards,backwards, which seems to work well if all I say are those particular words. However, any other words will cause the system to match what is said to the "nearest" word from those stated. Ideally I don't want recognition to occur if words spoken are not in the .gram file...

  • i read this question, but i can't find my answer. i do lots of searches too. i ask everyone who can help me, please see http://stackoverflow.com/q/37629636/3671748 – Mina Dahesh Jun 04 '16 at 11:29
  • i read this, but my problem is how can i define new KEYWORD -e.g. my phone- too. would toy please check my question? http://stackoverflow.com/q/37629636/3671748 – Mina Dahesh Jun 04 '16 at 13:37
  • can u help me please ? : http://stackoverflow.com/questions/39506271/how-can-i-add-custom-dictionaries-into-pocketsphinx-android – S.M_Emamian Sep 15 '16 at 08:38

3 Answers3

21

Thanks to Nikolay's tip (see his answer above), I have developed the following code which works fine, and does not recognize words unless they're on the list. You can copy and paste this directly over the main class in the PocketSphinxDemo code:

public class PocketSphinxActivity extends Activity implements RecognitionListener
{
private static final String DIGITS_SEARCH = "digits";
private SpeechRecognizer recognizer;

@Override
public void onCreate(Bundle state)
{
    super.onCreate(state);

    setContentView(R.layout.main);

    ((TextView) findViewById(R.id.caption_text)).setText("Preparing the recognizer");

    try
    {
        Assets assets = new Assets(PocketSphinxActivity.this);
        File assetDir = assets.syncAssets();
        setupRecognizer(assetDir);
    }
    catch (IOException e)
    {
        // oops
    }

    ((TextView) findViewById(R.id.caption_text)).setText("Say up, down, left, right, forwards, backwards");

    reset();
}

@Override
public void onPartialResult(Hypothesis hypothesis)
{
}

@Override
public void onResult(Hypothesis hypothesis)
{
    ((TextView) findViewById(R.id.result_text)).setText("");

    if (hypothesis != null)
    {
        String text = hypothesis.getHypstr();
        makeText(getApplicationContext(), text, Toast.LENGTH_SHORT).show();
    }
}

@Override
public void onBeginningOfSpeech()
{
}

@Override
public void onEndOfSpeech()
{
    reset();
}

private void setupRecognizer(File assetsDir)
{
    File modelsDir = new File(assetsDir, "models");

    recognizer = defaultSetup().setAcousticModel(new File(modelsDir, "hmm/en-us-semi"))
                               .setDictionary(new File(modelsDir, "dict/cmu07a.dic"))
                               .setRawLogDir(assetsDir).setKeywordThreshold(1e-20f)
                               .getRecognizer();

    recognizer.addListener(this);

    File digitsGrammar = new File(modelsDir, "grammar/digits.gram");
    recognizer.addKeywordSearch(DIGITS_SEARCH, digitsGrammar);
}

private void reset()
{
    recognizer.stop();
    recognizer.startListening(DIGITS_SEARCH);
}
}

Your digits.gram file should be something like:

up /1e-1/
down /1e-1/
left /1e-1/
right /1e-1/
forwards /1e-1/
backwards /1e-1/

You should experiment with the thresholds within the double slashes // for performance, where 1e-1 represents 0.1 (I think). I think the maximum is 1.0.

And it's 5.30pm so I can stop working now. Result.

  • 1
    Thanks man!! these lines made the diference I did not see the addKeywordSearch (not add keywordS search, oin plural): File digitsGrammar = new File(modelsDir, "grammar/digits.gram"); recognizer.addKeywordSearch(DIGITS_SEARCH, digitsGrammar); } private void reset() { recognizer.stop(); recognizer.startListening(DIGITS_SEARCH); } } – Josh Mar 20 '15 at 17:05
  • 2
    @pbs: Thanks for sharing your solution, it helped me a lot! I have one question though. Does your modified digits.gram contain anything else, or just the key words with the //? Because I get an exception, when trying to open and parse the digits.gram file. – Silex Jun 09 '15 at 02:03
  • You could try `up /1/ down /1/ left /1/ right /1/`, with carriage returns after the `/1/`'s. –  Jun 09 '15 at 06:48
  • 1
    Now it runs, but I still have the problem, that if I say something totally different which is not in my grammar file it still tries to fit the closest match, therefore whatever I say I get a match, which is not too user friendly. This is how my digits.gram file looks like: #JSGF V1.0; grammar digits; public = /1/ start | /1/ stop | /1/ frame; – Silex Jun 09 '15 at 22:17
  • 2
    I found my misstake...I wasn't using "addKeywordSearch", I was using addGrammarSearch...now I changed my grammer file to exactly what you have in your post above and it runs...but unfortunately I still get false positive results...so if I say something there will always be match even if I say something totally different. – Silex Jun 09 '15 at 22:42
  • As @Silex stated, same happens with me as well, Hypothesis returns values from .gram file without even speaking something. – Chitrang Nov 19 '15 at 09:59
  • @Josh Do you mind helping me? I am trying to simply recognize the word "hello". Thanks! http://stackoverflow.com/questions/35388720/cant-start-service-speech-recog – Ruchir Baronia Feb 14 '16 at 23:42
  • @Chitrang Do you mind helping me? I am trying to simply recognize the word "hello". Thanks! http://stackoverflow.com/questions/35388720/cant-start-service-speech-recog – Ruchir Baronia Feb 14 '16 at 23:43
  • i am facing a problem it is listening words without saying anything – Mansuu.... Mar 22 '16 at 06:50
  • @chitrang in my case hypothesis returns values from .gram file without even speaking something or speaking something else .how to get rid of this issue? – Mansuu.... Mar 29 '16 at 09:23
  • can we use local language words for speech recognition? – Mansuu.... Mar 30 '16 at 07:40
  • the higher the threshold, the more accurate you must speak? Or it is viceversa? @poirot – Naramsim Apr 03 '16 at 19:47
  • @Naramsim I'm not sure to be honest, and I haven't done anything with this code for over a year so don't recall the details. Maybe there are some docs on this somewhere... sorry I can't be of more help. –  Apr 03 '16 at 20:07
  • you had created your own dictionary or you added your words in existing dictionary? – Mansuu.... Apr 15 '16 at 05:40
  • do i need to build acoustic model, lm files and dictionary to search words. – Mansuu.... Apr 15 '16 at 10:11
  • I didn't have to build an acoustic model. I just used the files in my answer above. That's it. Any other files required came with the package. I just set the whole thing up by downloading and adding to eclipse project. The only "technical stuff" I did is mentioned in the answer. –  Apr 15 '16 at 13:50
  • Try /1e-1/ in the gram file. I vaguely recall other values did not work for me. It was a long time ago. –  Apr 16 '16 at 09:18
14

you can use addKeywordSearch which uses to file with keyphrases. One phrase per line with threshold for each phrase in //, for example

up /1.0/
down /1.0/
left /1.0/
right /1.0/
forwards /1e-1/

Threshold must be selected to avoid false alarms.

Nikolay Shmyrev
  • 24,897
  • 5
  • 43
  • 87
  • 2
    Can you share the entire text inside your .gram file please? I feel that something else is missing. I am new to grammar files. – Josh Mar 20 '15 at 16:30
  • There is nothing to update, this file is a file for keyword spotting as is, you should not add anything. And it is not grammar file, grammars are different. To learn about keyword spotting visit CMUSphinx page http://cmusphinx.sourceforge.net/wiki/tutoriallm – Nikolay Shmyrev Nov 19 '15 at 14:15
  • Assuming I use such a file with pocketsphinx_continuous, I would provide the file path using `-kws`. Could I then use `cmudict-en-us.dict` and the included 16-bit PTM `en-us` ARPA model? Would the accuracy improve if I created a new dictionary for just those 5 words? – user2023370 Jan 11 '16 at 00:53
  • 1
    en-us-ptm is an acoustic model, it is not arpa model. it is 16khz, not 16 bit. creating new dictionary would not improve the accuracy, though it might save you some memory (about 3mb). – Nikolay Shmyrev Jan 11 '16 at 00:58
  • Yes indeed, 16khz acoustic. What is the significance of making the threshold for `forwards` different from the others? Why not denote it as `/1e-1/` rather than `/0.1/`? – user2023370 Jan 11 '16 at 08:56
  • 1
    The threshold depends on the word, for optimal detection you need to use word-specific thresholds. Since word "forwards" has two syllables, it most likely needs a different threshold. You can use 0.1 if you like. – Nikolay Shmyrev Jan 11 '16 at 08:57
  • Are there any examples of such files in pocketsphinx? Do they have a file extension? – user2023370 Jan 11 '16 at 14:18
  • Example is provided in the answer. You do not need extension, you can choose arbitrary one according to your preferences. – Nikolay Shmyrev Jan 11 '16 at 15:01
  • why we need threshold,can anyone tell me – Mansuu.... Mar 22 '16 at 06:44
  • i am facing a problem it is listening words without saying anything – Mansuu.... Mar 22 '16 at 07:19
  • Threshold controls false alarms, if you have too many detections simply change threshold. – Nikolay Shmyrev Mar 22 '16 at 10:05
  • can we use local language words for speech recognition? – Mansuu.... Mar 30 '16 at 07:30
  • @NikolayShmyrev . i read this, but my problem is how can i define new KEYWORD -e.g. my phone- too. would toy please check my question? http://stackoverflow.com/q/37629636/3671748 – Mina Dahesh Jun 04 '16 at 13:38
  • can u help me please : http://stackoverflow.com/questions/39506271/how-can-i-add-custom-dictionaries-into-pocketsphinx-android – S.M_Emamian Sep 15 '16 at 08:38
0

Working on updating Antinous amendment to the PocketSphinx demo to allow it to run on Android Studio. This is what I have so far,

//Note: change MainActivity to PocketSphinxActivity for demo use...
public class MainActivity extends Activity implements RecognitionListener {
private static final String DIGITS_SEARCH = "digits";
private SpeechRecognizer recognizer;

/* Used to handle permission request */
private static final int PERMISSIONS_REQUEST_RECORD_AUDIO = 1;

@Override
public void onCreate(Bundle state) {
    super.onCreate(state);

    setContentView(R.layout.main);
    ((TextView) findViewById(R.id.caption_text))
            .setText("Preparing the recognizer");

    // Check if user has given permission to record audio
    int permissionCheck = ContextCompat.checkSelfPermission(getApplicationContext(), Manifest.permission.RECORD_AUDIO);
    if (permissionCheck != PackageManager.PERMISSION_GRANTED) {
        ActivityCompat.requestPermissions(this, new String[]{Manifest.permission.RECORD_AUDIO}, PERMISSIONS_REQUEST_RECORD_AUDIO);
        return;
    }

    new AsyncTask<Void, Void, Exception>() {
        @Override
        protected Exception doInBackground(Void... params) {
            try {
                Assets assets = new Assets(MainActivity.this);
                File assetDir = assets.syncAssets();
                setupRecognizer(assetDir);
            } catch (IOException e) {
                return e;
            }
            return null;
        }
        @Override
        protected void onPostExecute(Exception result) {
            if (result != null) {
                ((TextView) findViewById(R.id.caption_text))
                        .setText("Failed to init recognizer " + result);
            } else {
                reset();
            }
        }
    }.execute();
    ((TextView) findViewById(R.id.caption_text)).setText("Say one, two, three, four, five, six...");
}

/**
 * In partial result we get quick updates about current hypothesis. In
 * keyword spotting mode we can react here, in other modes we need to wait
 * for final result in onResult.
 */

@Override
public void onPartialResult(Hypothesis hypothesis) {
    if (hypothesis == null) {
        return;
    } else if (hypothesis != null) {
        if (recognizer != null) {
            //recognizer.rapidSphinxPartialResult(hypothesis.getHypstr());
            String text = hypothesis.getHypstr();
            if (text.equals(DIGITS_SEARCH)) {
                recognizer.cancel();
                performAction();
                recognizer.startListening(DIGITS_SEARCH);
            }else{
                //Toast.makeText(getApplicationContext(),"Partial result = " +text,Toast.LENGTH_SHORT).show();
            }
        }
    }
}
@Override
public void onResult(Hypothesis hypothesis) {
    ((TextView) findViewById(R.id.result_text)).setText("");
    if (hypothesis != null) {
        String text = hypothesis.getHypstr();
        makeText(getApplicationContext(), "Hypothesis" +text, Toast.LENGTH_SHORT).show();
    }else if(hypothesis == null){
        makeText(getApplicationContext(), "hypothesis = null", Toast.LENGTH_SHORT).show();
    }
}
@Override
public void onDestroy() {
    super.onDestroy();
    recognizer.cancel();
    recognizer.shutdown();
}
@Override
public void onBeginningOfSpeech() {
}
@Override
public void onEndOfSpeech() {
   reset();
}
@Override
public void onTimeout() {
}
private void setupRecognizer(File assetsDir) throws IOException {
    // The recognizer can be configured to perform multiple searches
    // of different kind and switch between them
    recognizer = defaultSetup()
            .setAcousticModel(new File(assetsDir, "en-us-ptm"))
            .setDictionary(new File(assetsDir, "cmudict-en-us.dict"))
            // .setRawLogDir(assetsDir).setKeywordThreshold(1e-20f)
            .getRecognizer();
    recognizer.addListener(this);

    File digitsGrammar = new File(assetsDir, "digits.gram");
    recognizer.addKeywordSearch(DIGITS_SEARCH, digitsGrammar);
}
private void reset(){
    recognizer.stop();
    recognizer.startListening(DIGITS_SEARCH);
}
@Override
public void onError(Exception error) {
    ((TextView) findViewById(R.id.caption_text)).setText(error.getMessage());
}

public void performAction() {
    // do here whatever you want
    makeText(getApplicationContext(), "performAction done... ", Toast.LENGTH_SHORT).show();
}
}

Caveat emptor: this is a work in progress. Check back later. Suggestions would be appreciated.

portsample
  • 1,986
  • 4
  • 19
  • 35