» » Convenient transcription for quick processing of audio recordings

Convenient transcription for quick processing of audio recordings


Audio-to-text conversion is widely used, for example, to create subtitles for videos, minutes of meetings, and transcripts of interviews. With the ML Kit service, the process becomes much easier: it very accurately converts audio recordings into text with correct punctuation.

Preparation for development

Set up the Huawei Maven repository and integrate the Audio Transcription SDK. A detailed description of the process can be found here .

Specifying permissions in the AndroidManifest.xml file

Open the AndroidManifest.xml file in the main folder . Add permissions to connect to the network, access network status, and read storage data before <application . Please note that you need to request dynamic permissions. Otherwise, there will be a Permission Denied error .

<uses-permission android:name="android.permission.INTERNET" />
<uses-permission android:name="android.permission.ACCESS_NETWORK_STATE" />
<uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE" />

Development process

Creation and initialization of the audio transcription engine

Override the onCreate method in the MainActivity class to create the audio transcription engine.

private MLRemoteAftEngine mAnalyzer;

mAnalyzer = MLRemoteAftEngine.getInstance();

Use MLRemoteAftSetting to set up the engine. Currently, the service supports standard Chinese and English, so the only options for mLanguage are zh and en .

MLRemoteAftSetting setting = new MLRemoteAftSetting.Factory()

The enablePunctuation parameter specifies whether to automatically insert punctuation in the received text. The default value is false .

If you set this parameter to true , the resulting text will be automatically punctuated, and if false , there will be no punctuation.

The enableWordTimeOffset parameter specifies whether to add a timestamp to each segment of the audio recording. The default value is false . Adjust this setting only when the recording time is less than a minute.

If the parameter value is true , the timestamps will be returned with a decryption. These options apply when transcribing short audio recordings no longer than a minute. If the parameter value is false , only the audio transcript will be returned.

The enableSentenceTimeOffset parameter specifies whether to add a timestamp to each sentence in the audio recording. The default value is false .

If the parameter value is true , the timestamps will be returned with a decryption. If the parameter value is false , only the audio transcript will be returned.

Create a listener callback to handle the result of the decryption

private MLRemoteAftListener mAsrListener = new MLRemoteAftListener() 

After the listener is initialized, call startTask on the AftListener to start decryption.

public void onInitComplete(String taskId, Object ext) {
    Log.i(TAG, "MLRemoteAftListener onInitComplete" + taskId);

Override the onUploadProgress , onEvent , and onResult methods in the MLRemoteAftListener listener .

public void onUploadProgress(String taskId, double progress, Object ext) {
    Log.i(TAG, " MLRemoteAftListener onUploadProgress is " + taskId + " " + progress);

public void onEvent(String taskId, int eventId, Object ext) {
    Log.e(TAG, "MLAsrCallBack onEvent" + eventId);
    if (MLAftEvents.UPLOADED_EVENT == eventId) { // The file is uploaded successfully.
        startQueryResult(); // Obtain the transcription result.

public void onResult(String taskId, MLRemoteAftResult result, Object ext) {
    Log.i(TAG, "onResult get " + taskId);
    if (result != null) {
        Log.i(TAG, "onResult isComplete " + result.isComplete());
        if (!result.isComplete()) {
        if (null != mTimerTask) {
        if (result.getText() != null) {
            Log.e(TAG, result.getText());

        List<MLRemoteAftResult.Segment> segmentList = result.getSegments();
        if (segmentList != null && segmentList.size() != 0) {
            for (MLRemoteAftResult.Segment segment : segmentList) {
                Log.e(TAG, "MLAsrCallBack segment  text is : " + segment.getText() + ", startTime is : " + segment.getStartTime() + ". endTime is : " + segment.getEndTime());

        List<MLRemoteAftResult.Segment> words = result.getWords();
        if (words != null && words.size() != 0) {
            for (MLRemoteAftResult.Segment word : words) {
                Log.e(TAG, "MLAsrCallBack word  text is : " + word.getText() + ", startTime is : " + word.getStartTime() + ". endTime is : " + word.getEndTime());

        List<MLRemoteAftResult.Segment> sentences = result.getSentences();
        if (sentences != null && sentences.size() != 0) {
            for (MLRemoteAftResult.Segment sentence : sentences) {
                Log.e(TAG, "MLAsrCallBack sentence  text is : " + sentence.getText() + ", startTime is : " + sentence.getStartTime() + ". endTime is : " + sentence.getEndTime());


Processing the result of decryption in polling mode

After decryption is complete, call getLongAftResult to get the result. Process the result once every 10 seconds.

private void startQueryResult() {
    Timer mTimer = new Timer();
    mTimerTask = new TimerTask() {
        public void run() {
    mTimer.schedule(mTimerTask, 5000, 10000); // Process the obtained long speech transcription result every 10 seconds.

private void getResult() {
    Log.e(TAG, "getResult");

Function operation in the application


Create and run an application with a built-in audio transcription feature. Then select an audio file on your device and convert the audio to text.

Related Articles

Add Your Comment

reload, if the code cannot be seen

All comments will be moderated before being published.