I already use HTK (Hidden Markov Model Tool Kit) for recognizing specific commands used to control my Android application, but in this case I need to pass some voice data to a server and that may consume more time.
To prevent this latency, I am thinking about using pocketsphinx to recognize the voice data locally with the Android application so that I won't need to pass that audio to the server.
If this is a good idea, is it easy to learn pocketsphinx from scratch? Also, what are advantages and disadvantages of both techniques (server-based and local voice recognition), and which one is better?