Guide Area

Android Tutorial – Continuous Speech Recognition with Pocketsphinx

During my latest project (Smart Mirror), I wanted to implement a continuous speech recognition that would work without stopping. I spent a lot of time finding a library that could work nicely, there were two of them which are worth mentioning: DroidSpeech and Pocketsphinx . DroidSpeech is a nice Android library which gives you a continuous speech recognition, although there were parts of it that were not as configurable as I would hope so. Pocketsphinx came in to save the day.

INTRODUCTION TO SPEECH RECOGNITION

As I had no experience with speech recognition libraries before I started this project, it was a big complicated and time-consuming for me to implement such feature. There is no specific step-by-step tutorial that would make things easier and faster, that is why I’m putting together a small walk-through.

This article will describe usage of a library called Pocketsphinx which brings the functionality. I suggest that you read articles before we get further, so you have a fundamental understanding of how the library works. The project is available on these URLs: https://github.com/cmusphinx/pocketsphinx and https://cmusphinx.github.io/wiki/tutorialandroid/

In case you want to check out Vikram Ezhil’s DroidSpeech, you can proceed to this URL: https://github.com/vikramezhil/DroidSpeech

PREPARATIONS

These are the first steps you’re about to do:

  1. Create a new Android project in Android Studio (this tutorial does not include Eclipse and IntelliJ steps)
  2. Go to the Pocketsphinx Android demo Github page, open ‘aars‘ directory and download ‘pocketsphinx-android-5prealpha-release.aar‘. In case the link isn’t working, it’s probably because there is a new version of the library or so. Check out the directory and download a file that has an *.aar extension
  3. Go to Android Studio. Click File -> New -> New module -> Import Jar/Aar Package -> Finish
  4. Open settings.gradle in your project and (if its not there already) add pocketsphinx to your include line:
    include ':app', ':pocketsphinx-android-5prealpha-release'
  5. Open app/build.gradle and add this line to dependencies:
    compile project(':pocketsphinx-android-5prealpha-release')
  6. Add permissions to your project Manifest file. Pocketsphinx can record your voice commands and save them to app’s folder. I did not find any usage for these files, so I did not include this permission. A way to disable this setting will be shown later.
    <uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE" />
    <uses-permission android:name="android.permission.RECORD_AUDIO" />
  7. Go to Pocketsphinx Android demo page on github and download file assets.xml from ‘models‘ directory, and put it in the app/ folder of your project.
  8. Go back to app/build.gradle in your project and add these lines to its absolute end:
    ant.importBuild 'assets.xml'
    preBuild.dependsOn(list, checksum)
    clean.dependsOn(clean_assets)
  9. On the Pocketsphinx Android demo page, navigate to models/src/main/assets, download the ‘sync’ folder and copy it to your ‘assets‘ folder in your project. This folder contains resources for speech recognition and will be synchronized on the first application run.

That is all for now. You should have the pocketsphinx ready for use in your project.

POCKETSPHINX USAGE

The PocketSphinxActivity.java file in on the github page covers the whole functionality. You can find it in app/src/main/java/edu/cmu/pocketsphinx/demo folder. The demo project is set to display some information on screen, but we will skip those because I’m pretty sure you want to have your own implementation. I did not do any UI changes, my code runs on background and I will provide a code with explanations to every part of the code. The permission part, where you ask for RECORD_AUDIO permission, will be skipped – you have to implement that yourself.

All the fields and methods described in the sections below are mandatory – implement them all.

Initialize fields and constants

    /* We only need the keyphrase to start recognition, one menu with list of choices,
       and one word that is required for method switchSearch - it will bring recognizer
       back to listening for the keyphrase*/
    private static final String KWS_SEARCH = "wakeup";
    private static final String MENU_SEARCH = "menu";

    /* Keyword we are looking for to activate recognition */
    private static final String KEYPHRASE = "oh mighty computer";
  
    /* Recognition object */
    private SpeechRecognizer recognizer;

Start recognizer configuration

@Override
public void onCreate(Bundle state) {
    super.onCreate(state);
    runRecognizerSetup();
}

Run recognizer setup

private void runRecognizerSetup() {
    // Recognizer initialization is a time-consuming and it involves IO,
    // so we execute it in async task
    new AsyncTask<Void, Void, Exception>() {
        @Override
        protected Exception doInBackground(Void... params) {
            try {
                Assets assets = new Assets(PocketSphinxActivity.this);
                File assetDir = assets.syncAssets();
                setupRecognizer(assetDir);
            } catch (IOException e) {
                return e;
            }
            return null;
        }

        @Override
        protected void onPostExecute(Exception result) {
            if (result != null) {
                System.out.println(result.getMessage());
            } else {
                switchSearch(KWS_SEARCH);
            }
        }
    }.execute();
}

Initialize your custom dictionary (dictionary explained at the end of article)

private void setupRecognizer(File assetsDir) throws IOException {
    recognizer = SpeechRecognizerSetup.defaultSetup()
            .setAcousticModel(new File(assetsDir, "en-us-ptm"))
    .setDictionary(new File(assetsDir, "cmudict-en-us.dict"))
    // Disable this line if you don't want recognizer to save raw
    // audio files to app's storage
    //.setRawLogDir(assetsDir)
    .getRecognizer();

    recognizer.addListener(this);


    // Create keyword-activation search.
    recognizer.addKeyphraseSearch(KWS_SEARCH, KEYPHRASE);

    // Create your custom grammar-based search
    File menuGrammar = new File(assetsDir, "mymenu.gram");
    recognizer.addGrammarSearch(MENU_SEARCH, menuGrammar);
}

Destroy recognizer objects on app exit

@Override
public void onStop() {
    super.onStop();

    if (recognizer != null) {
        recognizer.cancel();
        recognizer.shutdown();
    }
}

Switch between keyphrase or menu listening

@Override
public void onPartialResult(Hypothesis hypothesis) {
    if (hypothesis == null)
        return;

    String text = hypothesis.getHypstr();
    if (text.equals(KEYPHRASE))
        switchSearch(MENU_SEARCH);
    else {
        System.out.println(hypotesis.getHypstr());
    }
}

Print out voice command when recognized as full sentence

@Override
public void onResult(Hypothesis hypothesis) {
    if (hypothesis != null) {
        System.out.println(hypothesis.getHypstr());
    }
}

Custom action on beginning of speech – we don’t need any action

@Override
public void onBeginningOfSpeech() {
}

Reset recognizer back to keyphrase listening, or listen to menu options after end of speech

@Override
public void onEndOfSpeech() {
    if (!recognizer.getSearchName().equals(KWS_SEARCH))
        switchSearch(KWS_SEARCH);
}

This method will switch between continuous recognition of keyphrase, or recognition of menu items with 10 seconds timeout.

private void switchSearch(String searchName) {
    recognizer.stop();

    if (searchName.equals(KWS_SEARCH))
        recognizer.startListening(searchName);
    else
        recognizer.startListening(searchName, 10000);
}

Print out any errors

@Override
public void onError(Exception error) {
    System.out.println(error.getMessage());
}

If the 10 second timeout is finished, switch back to keyphrase recognition, as no menu command was received

@Override
public void onTimeout() {
    switchSearch(KWS_SEARCH);
}

DICTIONARY

As you probably noticed, we are using our own mymenu.gram file. This file is going to contain all the options for our menu. Create a new file in assets/sync/ called mymenu.gram and put this inside:

#JSGF V1.0;

grammar mymenu;
public <smart> = (good morning | hello);

Now go back to your onPartialResult() method and chance the if sentence to this form:

if (text.equals(KEYPHRASE))
    switchSearch(MENU_SEARCH);
} else if (text.equals("hello")) {
    System.out.println("Hello to you too!");
} else if (text.equals("good morning")) {
    System.out.println("Good morning to you too!");
} else {
    System.out.println(hypotesis.getHypstr());
}

CONCLUSION

… and that’s it! Now you have a continuous speech recognition that gets activated by using a custom-defined keyphrase, with a menu of options. You can extend these options and any other functionality of the recognizer code. I myself had to tweak a lot, as no recognition is 100% bulletproof, but that’s is something that I will leave to you to play with. Good luck.

Vladimir Marton

DevOps Engineer focused on cloud infrastructure, automation, CI/CD and programming in Javascript, Python, PHP and SQL. Guidearea is my oldest project where I write articles about programming, marketing, SEO and others.

45 comments

  • Thank you so much, vlad! This is what I was exactly looking for. Your tutorial saved my days. Here’s something I want to ask you about. While following this tutorial, I got a little stuck. I did everything you mentioned above for this tutorial and now I run my app on Android Studio but the app is replying nothing to me while I am speaking through mic. I even tried it on my real device but it came to me the same result. Could you please tell me how do I test this?

    Is it supposed to automatically start speech recognition right after the app runs?
    If I am wrong, I am sorry.
    Thank you for your great job anyway!

    • Hey Min, sorry for late answer. The speech recognition will start right away and will listen to your keyword – in this tutorial, it’s “oh mighty computer”. Try changing that to “hello”, then run your application, say “hello” and then half a second later try to say some sentence. It should recognize what you’re saying.

  • How can I use pocketsphinx to start an activity without touching the device even when the phone is locked?

  • Hey Vladimir

    Nice tutorial!

    Quick question, whats the difference between the .gram and .dict files?
    Gram files are called grammar files, but dont contain any grammar?
    Do they simply contain words of interest, selected from .dict?
    Possibly in phrases as well?

  • Hi Vladimir,

    thank you a bunch for this tutorial!

    I have only some questions regarding your code, since there are no classes or implements mentioned for the new project that hosts your example code I have to assume that it’s either kept as the generic “MainActivity” and then implements “RecognitionListener” like the PocketSphinxActivity, or has to be renamed to PocketSphinxActivity… or PocketSphinxActivity has to be imported from the pocketsphinxdemo as well ( which I can’t see to be mentioned in the steps for preparation of your sample code ).

    Sorry if it’s obvious but I’m fairly new to Android programming so I’m not certain how to proceed from hereon out.

  • Hi Vladimir, can you please give the github link of this project? I’m having some trouble and i want to verify if i setup all the things properly…

  • Hi Vladimir, Great Tutorial, pretty straight forward, thanks! I also played around with pocketsphinx and stumbled also over some issues. 1. Accuracy for me is very poor. I’m also using a very limited keyword list, but I guess a random picker has the same accuracy… also tried with different people, so It’s probably not only due to my pronunciation. 2. I have also the same experience as another post before stated: sometimes I see concatenation of results. 3. Even background noise serves as speech event. In best case the easy.is NULL, but quite often it detects a keyword as well.
    What are your experiences with that? I also had another try with Google speech recognizer in offline mode. This doesn’t offer the level of control.i require but at least accuracy is a totally different league.
    Thanks for any comments !

  • Hi Vladmir,
    I have no problem in integrating this code but can you tell me what the flow is ? I run the app on my device and I can see the begin speech method getting called. How to proceed after that for keyword search ? Do I have to say the keyphrase first and then the commands or directly commands ? I tried saying both but I was not able to get anything in the log.

  • hi … i wana try to make a game not really game … i wnt to make a list of levels and evry level has few words .. and when the gamer spell it the application will give him his result of his pronociation… plz help m to do that … and thnx

    • This should be very straightforward to do, just try to trigger Pocketsphinx every time you want display results at each of your levels, otherwise Pocketsphinx activity should be in sleep mode, wake it up every time you want speech recognition.

  • Hi thx for the great tutorial, i’m having problem running the demo, because of the minimal requirements is API 24, the device i have is API 22… is there a workaround? thx

  • Hi Vladimir Marton,
    It was good tutorial, but i wanted PocketSphinx app to run continuously based on user input Speech to text. now sample demo app is fixed with digits, forecast, phones but without users input the text is being filled with previous data even if there is no input from user.

    Waiting for your reply, thanks in advance.

  • Hello VLADIMIR MARTON,
    It is really a wonderful article and its really helpful but the issue is that i am confused that where I have to make changes in it. because I want to give dictionary of another language. So, how can I do it? kindly, reply me as soon as possible!

  • hi vladimir for these sample and others. I watched your commands and sync it succesfully. but I wanna ask you a different thing. in netbeans I created a project it takes wav file and gives back the text. in pocketsphinx is it possible? I tried same codes but “StreamSpeechRecognizer” class not identified in android. can you help about it with a sample code?

  • How do I get the confidence level (accuracy) of the generated speech to text output.
    I am trying to use decoder.hyp().getBeastScores() but unable to implement it on my system. Would be great if you could help.

  • Great tutorial! Love it. You make it seem so easy!
    I compiled it using Android Studio and it works for me. A couple of questions on how to recognize.
    How do I recognize numbers such as “1981”. I see it recognizes digits.
    Also, when I speak the keyword, the result text keeps concatenating the key word.

    String text = hypothesis.getHypstr();

    I tried to set text =””, but it still kept concatenating.
    I need one word at a time so I can switch on the text to process further.

    Any ideas?

    Thanks

    Ram

  • Hello Vlandimir i’m Riyaz, i have a project for speech text using Pocketsphinx. but the problem is that its recognize only specific words what is mentioned in the code if it will match then true otherwise false.
    i need to recognize any word what ever i speak either screen is on or lock and also free from language.
    thank you..

    • Hello Riyaz, you cannot use speech recognition when screen is locked, that is not supported by Android natively, you could maybe try to put the recognizer in a Background service but Android 8 started to limit those as well, plus I havent tried that so I cannot confirm. If by “free from language” you mean any language at any time, you have to add those languages (dictionaries) – again, I havent tested multiple languages at the same time, so no confirmation here either. To catch any word, you use this method (where in else statement you can do whatever you want with the result):

      @Override
      public void onPartialResult(Hypothesis hypothesis) {
          if (hypothesis == null)
              return;
          String text = hypothesis.getHypstr();
          if (text.equals(KEYPHRASE))
              switchSearch(MENU_SEARCH);
          else {
              System.out.println(hypotesis.getHypstr());
          }
      }
  • Hi, vlad,
    Nice tutorial there. Could you please put up a empty project with these settings (build.gradle, etc.) set in ready along with this article? Because I’m using AIDE (apk) on mobile development platform and messed up a lot with build.gradle setting to integrate pocketsphinx into a project. If you put up an empty project, then I think I could figure out how the project structure looks like on my IDE (AIDE app). By the way AIDE is great IDE and I’ve been using it for a few years.

    • Hi Myo, the project is the same as if you create an empty one and fill it with the values from my tutorial. I’m not going to create an empty project in this tutorial, as its more advanced and it would only make the whole article oversized. BTW I highly recommend switching to Android Studio, it’s a very powerful and robust solution.

    • hi vlad i want to add the “en-in” language model instead of “en-us” language model in my android app.
      can u help me in this .
      which files are required or need to be changed for this purpose .

      thankx in advance

    • Hello again, I hope you found an answer to your question, but if not, here it goes:
      Basically you have to download acoustic model and dictionary for your language in step “INITIALIZE YOUR CUSTOM DICTIONARY”. You can see that I use en-us-ptm as an acoustic model and cmudict-en-us.dict as a dictionary.

      recognizer = SpeechRecognizerSetup.defaultSetup()
                  .setAcousticModel(new File(assetsDir, "en-us-ptm"))
                  .setDictionary(new File(assetsDir, "cmudict-en-us.dict"))
      

      Try to find these two files for your language on CMUSphinx/Pocketsphinx sites and implement them into this solution.

  • do i have to create a folder named “assets” in app folder ?? in which i will copy the sync folder ??

  • hi vlad im getting the following error in the asset file can u help?

    nt.importBuild ‘assets.xml’
    preBuild.dependsOn(list, checksum)
    clean.dependsOn(clean_assets)

    Error:Execution failed for task ‘:app:checksum’.
    > D:\usman\practice apps\MyApplication\MyApplication3\app\src\main\assets\sync does not exist.

    • Hello Usman, check if the folder ‘sync’ stated in the exception exists. If it does, then the problem might be in the folder ‘practice apps’ – in general its not a good idea to have folders with space in them included.

      If the folder ‘sync’ does not exist, then you dont have all the files required. In thie case i suggest that you go through the heginning of this tutorial again.

  • Very good article.I tried the sample using PocketSphinx android.It is working fine on the Android mobile device and M300 Vuzix smart glass.However, when I install the same demo app on google glass 2, it is not recognizing any keyword.Any ideas?
    Note: demo app is same as given on PocketSphinx android GitHub home page

    • Thank you Rahul. Sorry for the late response. I have never worked with google glass, so I do not know the system configuration of the device and so one. So I’m sorry but I’m not going to be able to help you there :/

  • This is exactly what I was looking for. Much more clear than the documentation. Thanks so much!

  • @Override
    public void onTimeout() {
    switchSearch(KWS_SEARCH);
    }

    can we active listening again when time out.plz help

  • hi, vladimir Marton can we run a pocketsphinx app in the background all time.when a user calls it response.

  • not working
    Open_Package: onBeginningOfSpeech
    Open_Package: onEndOfSpeech
    i try to say “hello” but not working

    • Hello Nome,
      there is many ways why this could fail. If Pocketsphinx cannot recognize the voice, I would start with checking settings of the AVD – maybe the virtual device cannot use your microphone at all.

    • Hi Sid, sorry, I barely have time for the written tutorials, so don’t expect any video tutorials anytime soon. Hopefully the written tutorial is sufficient.