Android

Android Tutorial – Continuous Speech Recognition with Pocketsphinx

2nd September 2017

6 min read

During my latest project (Smart Mirror), I wanted to implement a continuous speech recognition that would work without stopping. I spent a lot of time finding a library that could work nicely, there were two of them which are worth mentioning: DroidSpeech and Pocketsphinx . DroidSpeech is a nice Android library which gives you a continuous speech recognition, although there were parts of it that were not as configurable as I would hope so. Pocketsphinx came in to save the day.

INTRODUCTION TO SPEECH RECOGNITION

As I had no experience with speech recognition libraries before I started this project, it was a big complicated and time-consuming for me to implement such feature. There is no specific step-by-step tutorial that would make things easier and faster, that is why I’m putting together a small walk-through.

This article will describe usage of a library called Pocketsphinx which brings the functionality. I suggest that you read articles before we get further, so you have a fundamental understanding of how the library works. The project is available on these URLs: https://github.com/cmusphinx/pocketsphinx and https://cmusphinx.github.io/wiki/tutorialandroid/

In case you want to check out Vikram Ezhil’s DroidSpeech, you can proceed to this URL: https://github.com/vikramezhil/DroidSpeech

PREPARATIONS

These are the first steps you’re about to do:

Create a new Android project in Android Studio (this tutorial does not include Eclipse and IntelliJ steps)
Go to the Pocketsphinx Android demo Github page, open ‘aars‘ directory and download ‘pocketsphinx-android-5prealpha-release.aar‘. In case the link isn’t working, it’s probably because there is a new version of the library or so. Check out the directory and download a file that has an *.aar extension
Go to Android Studio. Click File -> New -> New module -> Import Jar/Aar Package -> Finish
Open settings.gradle in your project and (if its not there already) add pocketsphinx to your include line:
```
include ':app', ':pocketsphinx-android-5prealpha-release'
```

Open app/build.gradle and add this line to dependencies:

compile project(':pocketsphinx-android-5prealpha-release')

Add permissions to your project Manifest file. Pocketsphinx can record your voice commands and save them to app’s folder. I did not find any usage for these files, so I did not include this permission. A way to disable this setting will be shown later.
```
<uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE" />
<uses-permission android:name="android.permission.RECORD_AUDIO" />
```
Go to Pocketsphinx Android demo page on github and download file assets.xml from ‘models‘ directory, and put it in the app/ folder of your project.

Go back to app/build.gradle in your project and add these lines to its absolute end:

ant.importBuild 'assets.xml'
preBuild.dependsOn(list, checksum)
clean.dependsOn(clean_assets)

On the Pocketsphinx Android demo page, navigate to models/src/main/assets, download the ‘sync’ folder and copy it to your ‘assets‘ folder in your project. This folder contains resources for speech recognition and will be synchronized on the first application run.

That is all for now. You should have the pocketsphinx ready for use in your project.

POCKETSPHINX USAGE

The PocketSphinxActivity.java file in on the github page covers the whole functionality. You can find it in app/src/main/java/edu/cmu/pocketsphinx/demo folder. The demo project is set to display some information on screen, but we will skip those because I’m pretty sure you want to have your own implementation. I did not do any UI changes, my code runs on background and I will provide a code with explanations to every part of the code. The permission part, where you ask for RECORD_AUDIO permission, will be skipped – you have to implement that yourself.

All the fields and methods described in the sections below are mandatory – implement them all.

Initialize fields and constants

    /* We only need the keyphrase to start recognition, one menu with list of choices,
       and one word that is required for method switchSearch - it will bring recognizer
       back to listening for the keyphrase*/
    private static final String KWS_SEARCH = "wakeup";
    private static final String MENU_SEARCH = "menu";

    /* Keyword we are looking for to activate recognition */
    private static final String KEYPHRASE = "oh mighty computer";
  
    /* Recognition object */
    private SpeechRecognizer recognizer;

Start recognizer configuration

@Override
public void onCreate(Bundle state) {
    super.onCreate(state);
    runRecognizerSetup();
}

Run recognizer setup

private void runRecognizerSetup() {
    // Recognizer initialization is a time-consuming and it involves IO,
    // so we execute it in async task
    new AsyncTask<Void, Void, Exception>() {
        @Override
        protected Exception doInBackground(Void... params) {
            try {
                Assets assets = new Assets(PocketSphinxActivity.this);
                File assetDir = assets.syncAssets();
                setupRecognizer(assetDir);
            } catch (IOException e) {
                return e;
            }
            return null;
        }

        @Override
        protected void onPostExecute(Exception result) {
            if (result != null) {
                System.out.println(result.getMessage());
            } else {
                switchSearch(KWS_SEARCH);
            }
        }
    }.execute();
}

Initialize your custom dictionary (dictionary explained at the end of article)

private void setupRecognizer(File assetsDir) throws IOException {
    recognizer = SpeechRecognizerSetup.defaultSetup()
            .setAcousticModel(new File(assetsDir, "en-us-ptm"))
    .setDictionary(new File(assetsDir, "cmudict-en-us.dict"))
    // Disable this line if you don't want recognizer to save raw
    // audio files to app's storage
    //.setRawLogDir(assetsDir)
    .getRecognizer();

    recognizer.addListener(this);


    // Create keyword-activation search.
    recognizer.addKeyphraseSearch(KWS_SEARCH, KEYPHRASE);

    // Create your custom grammar-based search
    File menuGrammar = new File(assetsDir, "mymenu.gram");
    recognizer.addGrammarSearch(MENU_SEARCH, menuGrammar);
}

Destroy recognizer objects on app exit

@Override
public void onStop() {
    super.onStop();

    if (recognizer != null) {
        recognizer.cancel();
        recognizer.shutdown();
    }
}

Switch between keyphrase or menu listening

@Override
public void onPartialResult(Hypothesis hypothesis) {
    if (hypothesis == null)
        return;

    String text = hypothesis.getHypstr();
    if (text.equals(KEYPHRASE))
        switchSearch(MENU_SEARCH);
    else {
        System.out.println(hypotesis.getHypstr());
    }
}

Print out voice command when recognized as full sentence

@Override
public void onResult(Hypothesis hypothesis) {
    if (hypothesis != null) {
        System.out.println(hypothesis.getHypstr());
    }
}

Custom action on beginning of speech – we don’t need any action

@Override
public void onBeginningOfSpeech() {
}

Reset recognizer back to keyphrase listening, or listen to menu options after end of speech

@Override
public void onEndOfSpeech() {
    if (!recognizer.getSearchName().equals(KWS_SEARCH))
        switchSearch(KWS_SEARCH);
}

This method will switch between continuous recognition of keyphrase, or recognition of menu items with 10 seconds timeout.

private void switchSearch(String searchName) {
    recognizer.stop();

    if (searchName.equals(KWS_SEARCH))
        recognizer.startListening(searchName);
    else
        recognizer.startListening(searchName, 10000);
}

Print out any errors

@Override
public void onError(Exception error) {
    System.out.println(error.getMessage());
}

If the 10 second timeout is finished, switch back to keyphrase recognition, as no menu command was received

@Override
public void onTimeout() {
    switchSearch(KWS_SEARCH);
}

DICTIONARY

As you probably noticed, we are using our own mymenu.gram file. This file is going to contain all the options for our menu. Create a new file in assets/sync/ called mymenu.gram and put this inside:

#JSGF V1.0;

grammar mymenu;
public <smart> = (good morning | hello);

Now go back to your onPartialResult() method and chance the if sentence to this form:

if (text.equals(KEYPHRASE))
    switchSearch(MENU_SEARCH);
} else if (text.equals("hello")) {
    System.out.println("Hello to you too!");
} else if (text.equals("good morning")) {
    System.out.println("Good morning to you too!");
} else {
    System.out.println(hypotesis.getHypstr());
}

CONCLUSION

… and that’s it! Now you have a continuous speech recognition that gets activated by using a custom-defined keyphrase, with a menu of options. You can extend these options and any other functionality of the recognizer code. I myself had to tweak a lot, as no recognition is 100% bulletproof, but that’s is something that I will leave to you to play with. Good luck.

Vladimir Marton

DevOps Engineer focused on cloud infrastructure, automation, CI/CD and programming in Javascript, Python, PHP and SQL. Guidearea is my oldest project where I write articles about programming, marketing, SEO and others.

View all posts

45 comments

Min Young Lee says:

12th October 2018 at 02:35

Thank you so much, vlad! This is what I was exactly looking for. Your tutorial saved my days. Here’s something I want to ask you about. While following this tutorial, I got a little stuck. I did everything you mentioned above for this tutorial and now I run my app on Android Studio but the app is replying nothing to me while I am speaking through mic. I even tried it on my real device but it came to me the same result. Could you please tell me how do I test this?

Is it supposed to automatically start speech recognition right after the app runs?
If I am wrong, I am sorry.
Thank you for your great job anyway!
- Vladimir Marton says:
  
  15th October 2018 at 11:13
  
  Hey Min, sorry for late answer. The speech recognition will start right away and will listen to your keyword – in this tutorial, it’s “oh mighty computer”. Try changing that to “hello”, then run your application, say “hello” and then half a second later try to say some sentence. It should recognize what you’re saying.
Bibin Johny says:

5th September 2018 at 19:48

How can I use pocketsphinx to start an activity without touching the device even when the phone is locked?
Adrian Xu says:

26th August 2018 at 10:08

Hey Vladimir

Nice tutorial!

Quick question, whats the difference between the .gram and .dict files?
Gram files are called grammar files, but dont contain any grammar?
Do they simply contain words of interest, selected from .dict?
Possibly in phrases as well?
Fabian says:

23rd July 2018 at 16:47

Hi Vladimir,

thank you a bunch for this tutorial!

I have only some questions regarding your code, since there are no classes or implements mentioned for the new project that hosts your example code I have to assume that it’s either kept as the generic “MainActivity” and then implements “RecognitionListener” like the PocketSphinxActivity, or has to be renamed to PocketSphinxActivity… or PocketSphinxActivity has to be imported from the pocketsphinxdemo as well ( which I can’t see to be mentioned in the steps for preparation of your sample code ).

Sorry if it’s obvious but I’m fairly new to Android programming so I’m not certain how to proceed from hereon out.
Kamran says:

27th March 2018 at 18:14

Hi Vladimir, can you please give the github link of this project? I’m having some trouble and i want to verify if i setup all the things properly…
Joernpeter Pook says:

27th March 2018 at 08:07

Hi Vladimir, Great Tutorial, pretty straight forward, thanks! I also played around with pocketsphinx and stumbled also over some issues. 1. Accuracy for me is very poor. I’m also using a very limited keyword list, but I guess a random picker has the same accuracy… also tried with different people, so It’s probably not only due to my pronunciation. 2. I have also the same experience as another post before stated: sometimes I see concatenation of results. 3. Even background noise serves as speech event. In best case the easy.is NULL, but quite often it detects a keyword as well.
What are your experiences with that? I also had another try with Google speech recognizer in offline mode. This doesn’t offer the level of control.i require but at least accuracy is a totally different league.
Thanks for any comments !
Vivek Mishra says:

26th March 2018 at 09:42

Hi Vladmir,
I have no problem in integrating this code but can you tell me what the flow is ? I run the app on my device and I can see the begin speech method getting called. How to proceed after that for keyword search ? Do I have to say the keyphrase first and then the commands or directly commands ? I tried saying both but I was not able to get anything in the log.
MiRa says:

24th March 2018 at 02:47

hi … i wana try to make a game not really game … i wnt to make a list of levels and evry level has few words .. and when the gamer spell it the application will give him his result of his pronociation… plz help m to do that … and thnx
- Deviabeast says:
  
  26th March 2018 at 09:10
  
  This should be very straightforward to do, just try to trigger Pocketsphinx every time you want display results at each of your levels, otherwise Pocketsphinx activity should be in sleep mode, wake it up every time you want speech recognition.
William says:

22nd March 2018 at 15:09

Hi thx for the great tutorial, i’m having problem running the demo, because of the minimal requirements is API 24, the device i have is API 22… is there a workaround? thx
chaithra says:

1st March 2018 at 13:39

Hi Vladimir Marton,
It was good tutorial, but i wanted PocketSphinx app to run continuously based on user input Speech to text. now sample demo app is fixed with digits, forecast, phones but without users input the text is being filled with previous data even if there is no input from user.

Waiting for your reply, thanks in advance.
sheharyar says:

21st February 2018 at 14:18

Hello VLADIMIR MARTON,
It is really a wonderful article and its really helpful but the issue is that i am confused that where I have to make changes in it. because I want to give dictionary of another language. So, how can I do it? kindly, reply me as soon as possible!
volkanucar says:

15th February 2018 at 22:50

hi vladimir for these sample and others. I watched your commands and sync it succesfully. but I wanna ask you a different thing. in netbeans I created a project it takes wav file and gives back the text. in pocketsphinx is it possible? I tried same codes but “StreamSpeechRecognizer” class not identified in android. can you help about it with a sample code?
Karthik says:

8th February 2018 at 08:08

How do I get the confidence level (accuracy) of the generated speech to text output.
I am trying to use decoder.hyp().getBeastScores() but unable to implement it on my system. Would be great if you could help.
- Karthik says:
  
  8th February 2018 at 08:10
  
  ****decoder.hyp().getBestScores()
  https://stackoverflow.com/questions/20825654/cmusphinx-what-is-the-score-of-a-recognised-hypothesis
  I tried above solution but having problem with implementation
- Karthik says:
  
  8th February 2018 at 08:20
  
  config = recognizer.getDecoder().getConfig();
  this.decoder = new Decoder(config);
  Log.e(TAG, decoder.hyp().getBestScore() + “”);
  
  I tried this method and got the following error:A/libc: Fatal signal 11 (SIGSEGV), code 1, fault addr 0x0 in tid 23173 (AsyncTask #1)
- Vladimir Marton says:
  
  8th February 2018 at 09:02
  Hello Karthik, unfortunately i have no experience with confidence level. Where exactly in the code do you call .getBestScore() ? Have you tried putting it inside of the onPartialResult() method? You can also read this answer from stackoverflow – just the first part of it:
```
https://stackoverflow.com/a/37279407/6148510
```
Ram Ramesh says:

19th January 2018 at 15:13

In the description, you do mention your code runs in the background?
Ram Ramesh says:

19th January 2018 at 15:11

Great tutorial! Love it. You make it seem so easy!
I compiled it using Android Studio and it works for me. A couple of questions on how to recognize.
How do I recognize numbers such as “1981”. I see it recognizes digits.
Also, when I speak the keyword, the result text keeps concatenating the key word.

String text = hypothesis.getHypstr();

I tried to set text =””, but it still kept concatenating.
I need one word at a time so I can switch on the text to process further.

Any ideas?

Thanks

Ram
Riyazuddin Khan says:

16th January 2018 at 12:15

Hello Vlandimir i’m Riyaz, i have a project for speech text using Pocketsphinx. but the problem is that its recognize only specific words what is mentioned in the code if it will match then true otherwise false.
i need to recognize any word what ever i speak either screen is on or lock and also free from language.
thank you..
- Vladimir Marton says:
  
  16th January 2018 at 12:26
  Hello Riyaz, you cannot use speech recognition when screen is locked, that is not supported by Android natively, you could maybe try to put the recognizer in a Background service but Android 8 started to limit those as well, plus I havent tried that so I cannot confirm. If by “free from language” you mean any language at any time, you have to add those languages (dictionaries) – again, I havent tested multiple languages at the same time, so no confirmation here either. To catch any word, you use this method (where in else statement you can do whatever you want with the result):
```
@Override
public void onPartialResult(Hypothesis hypothesis) {
    if (hypothesis == null)
        return;
    String text = hypothesis.getHypstr();
    if (text.equals(KEYPHRASE))
        switchSearch(MENU_SEARCH);
    else {
        System.out.println(hypotesis.getHypstr());
    }
}
```
- Riyazuddin Khan says:
  
  16th January 2018 at 14:05
  
  Thank you Vladimir i’m so glad that i got ur reply
Myo says:

8th January 2018 at 18:54

Hi, vlad,
Nice tutorial there. Could you please put up a empty project with these settings (build.gradle, etc.) set in ready along with this article? Because I’m using AIDE (apk) on mobile development platform and messed up a lot with build.gradle setting to integrate pocketsphinx into a project. If you put up an empty project, then I think I could figure out how the project structure looks like on my IDE (AIDE app). By the way AIDE is great IDE and I’ve been using it for a few years.
- Vladimir Marton says:
  
  8th January 2018 at 19:01
  
  Hi Myo, the project is the same as if you create an empty one and fill it with the values from my tutorial. I’m not going to create an empty project in this tutorial, as its more advanced and it would only make the whole article oversized. BTW I highly recommend switching to Android Studio, it’s a very powerful and robust solution.
m usman says:

18th December 2017 at 11:06

resolved the error thanx vlad very informative tutorial 🙂
- Vladimir Marton says:
  
  18th December 2017 at 15:04
  
  No problem mate, sorry I was not around, I was at work so I could not answer sooner. Glad to hear you managed to fix it 🙂
- m usman says:
  
  18th December 2017 at 18:50
  
  hi vlad i want to add the “en-in” language model instead of “en-us” language model in my android app.
  can u help me in this .
  which files are required or need to be changed for this purpose .
  
  thankx in advance
- Vladimir Marton says:
  
  20th December 2017 at 16:00
  Hello again, I hope you found an answer to your question, but if not, here it goes:
  Basically you have to download acoustic model and dictionary for your language in step “INITIALIZE YOUR CUSTOM DICTIONARY”. You can see that I use en-us-ptm as an acoustic model and cmudict-en-us.dict as a dictionary.
```
recognizer = SpeechRecognizerSetup.defaultSetup()
            .setAcousticModel(new File(assetsDir, "en-us-ptm"))
            .setDictionary(new File(assetsDir, "cmudict-en-us.dict"))
```
  Try to find these two files for your language on CMUSphinx/Pocketsphinx sites and implement them into this solution.
- m usman says:
  
  20th December 2017 at 20:58
  
  thanx mate
m usman says:

18th December 2017 at 10:51

do i have to create a folder named “assets” in app folder ?? in which i will copy the sync folder ??
m usman says:

18th December 2017 at 08:30

hi vlad im getting the following error in the asset file can u help?

nt.importBuild ‘assets.xml’
preBuild.dependsOn(list, checksum)
clean.dependsOn(clean_assets)

Error:Execution failed for task ‘:app:checksum’.
> D:\usman\practice apps\MyApplication\MyApplication3\app\src\main\assets\sync does not exist.
- Vladimir Marton says:
  
  18th December 2017 at 08:44
  
  Hello Usman, check if the folder ‘sync’ stated in the exception exists. If it does, then the problem might be in the folder ‘practice apps’ – in general its not a good idea to have folders with space in them included.
  
  If the folder ‘sync’ does not exist, then you dont have all the files required. In thie case i suggest that you go through the heginning of this tutorial again.
Rahul Sharma says:

15th December 2017 at 11:52

Very good article.I tried the sample using PocketSphinx android.It is working fine on the Android mobile device and M300 Vuzix smart glass.However, when I install the same demo app on google glass 2, it is not recognizing any keyword.Any ideas?
Note: demo app is same as given on PocketSphinx android GitHub home page
- Vladimir Marton says:
  
  17th December 2017 at 14:17
  
  Thank you Rahul. Sorry for the late response. I have never worked with google glass, so I do not know the system configuration of the device and so one. So I’m sorry but I’m not going to be able to help you there :/
Kevin Brock says:

3rd December 2017 at 18:55

This is exactly what I was looking for. Much more clear than the documentation. Thanks so much!
- Vladimir Marton says:
  
  4th December 2017 at 15:13
  
  Thanks for the positive feedback Kevin 🙂
aslam says:

24th November 2017 at 14:06

@Override
public void onTimeout() {
switchSearch(KWS_SEARCH);
}

can we active listening again when time out.plz help
- Vladimir Marton says:
  
  24th November 2017 at 15:12
  
  This pocketsphinx installation listens all the time, so no worries about that 🙂
aslam says:

24th November 2017 at 13:56

hi, vladimir Marton can we run a pocketsphinx app in the background all time.when a user calls it response.
- Vladimir Marton says:
  
  24th November 2017 at 15:12
  
  Hello Aslam, I’m not sure if pocketsphinx can run as a background service, especially after Google restricted background services from Android 8.0 above.
nome says:

22nd November 2017 at 12:54

not working
Open_Package: onBeginningOfSpeech
Open_Package: onEndOfSpeech
i try to say “hello” but not working
- Vladimir Marton says:
  
  23rd November 2017 at 08:20
  
  Hello Nome,
  there is many ways why this could fail. If Pocketsphinx cannot recognize the voice, I would start with checking settings of the AVD – maybe the virtual device cannot use your microphone at all.
SiD says:

12th November 2017 at 14:48

bro to good bro can you make a video tutorial for this……….
- Vladimir Marton says:
  
  12th November 2017 at 19:33
  
  Hi Sid, sorry, I barely have time for the written tutorials, so don’t expect any video tutorials anytime soon. Hopefully the written tutorial is sufficient.

Android Tutorial – Continuous Speech Recognition with Pocketsphinx

INTRODUCTION TO SPEECH RECOGNITION

PREPARATIONS

POCKETSPHINX USAGE

Initialize fields and constants

Start recognizer configuration

Run recognizer setup

Initialize your custom dictionary (dictionary explained at the end of article)

Destroy recognizer objects on app exit

Switch between keyphrase or menu listening

Print out voice command when recognized as full sentence

Custom action on beginning of speech – we don’t need any action

Reset recognizer back to keyphrase listening, or listen to menu options after end of speech

This method will switch between continuous recognition of keyphrase, or recognition of menu items with 10 seconds timeout.

Print out any errors

If the 10 second timeout is finished, switch back to keyphrase recognition, as no menu command was received

DICTIONARY

CONCLUSION

Vladimir Marton

45 comments

Granted, denied and permanently denied permissions in Android

Android Espresso examples for UI testing

Alarm Application in Android (Tutorial using AlarmManager)

Find classes in Android project or application path

Search

Categories

Android Tutorial – Continuous Speech Recognition with Pocketsphinx

INTRODUCTION TO SPEECH RECOGNITION

PREPARATIONS

POCKETSPHINX USAGE

Initialize fields and constants

Start recognizer configuration

Run recognizer setup

Initialize your custom dictionary (dictionary explained at the end of article)

Destroy recognizer objects on app exit

Switch between keyphrase or menu listening

Print out voice command when recognized as full sentence

Custom action on beginning of speech – we don’t need any action

Reset recognizer back to keyphrase listening, or listen to menu options after end of speech

This method will switch between continuous recognition of keyphrase, or recognition of menu items with 10 seconds timeout.

Print out any errors

If the 10 second timeout is finished, switch back to keyphrase recognition, as no menu command was received

DICTIONARY

CONCLUSION

Vladimir Marton

45 comments

You may also like

Granted, denied and permanently denied permissions in Android

Android Espresso examples for UI testing

Alarm Application in Android (Tutorial using AlarmManager)

Find classes in Android project or application path

Search

Categories

Tags