Streaming Audio Transcription with Flutter and AssemblyAI

In this blog post I will walk you through the process of streaming audio from a mic device to AssemblyAI web-socket service. This strategy will create a near-realtime transcription for creating an incredible low-latency user experience. By streaming the audio, there is very low latency between the time the stop button is pressed for recording and the transcription is delivered because most of the audio has already be transcribed.

Prerequisites

  • AssemblyAI account
  • Flutter dependencies installed
  web_socket_channel: ^2.1.0
flutter_sound: ^9.2.13

The Build

First, we need to add a method for starting the recording and generating a stream. This will require microphone permissions so you will want to request these when the button is first pressed.

Ask permission for the microphone:

await Permission.microphone.request()

Start the recorder and generate the streaming controller:

StreamController<Food> recordingDataController = StreamController<Food>();
FlutterSoundRecorder myRecorder = FlutterSoundRecorder();
await myRecorder.startRecorder(
toStream: recordingDataController.sink,
codec: Codec.pcm16,
numChannels: 1,
sampleRate: 16000,
);

These audio settings come from the AssemblyAI documentation.
https://www.assemblyai.com/docs/guides/real-time-streaming-transcription

Next, we will establish our web-socket connection, send auth, and start to stream audio from our previously created recordingDataController. Once our web-socket is established we can start to add audio chunks to the channel. When our recording is complete, we send a special message to let the service know we sent our last audio chunk and want the completed transcription.

// This token should not be hard coded into your client application. A temp
// token can be generated from your authenticated backend for your client app.
// https://www.assemblyai.com/docs/guides/real-time-streaming-transcription#creating-temporary-authentication-tokens
var token = '...'
var channel = WebSocketChannel.connect(
Uri.parse(
'wss://api.assemblyai.com/v2/realtime/ws?sample_rate=16000&token=$token'),
);
recordingStreamController.stream.listen((buffer) {
if (buffer is FoodData) {
String base64Data = base64.encode(buffer.data!);
channel.sink.add(jsonEncode({'audio_data': base64Data}));
}
}, onDone: () {
channel.sink.add(jsonEncode({"terminate_session": true}));
});

Now that we have sent all of our audio and sent a termination message, we need to receive the transcription audio. In this step we will check the messages type and look for SessionTerminated or FinalTranscript to trigger the closing of our web-socket and gathering the final transcription text.

var finalTranscript = '';
await channel.stream.listen((msg) {
final resData = jsonDecode(msg.toString());
final messageType = resData["message_type"];
if (messageType == 'SessionTerminated') {
channel.sink.close();
} else if (messageType == 'FinalTranscript') {
finalTranscript = resData['text'];
recordingStreamController.close();
channel.sink.close();
}
}
print('final transcription: ${finalTranscript}')

Now that we have this final transcript, we can process it however we want in the app.

Assembly AI realtime transcription https://www.assemblyai.com/docs/guides/real-time-streaming-transcription

Happy Coding!

Contact

Open for contract projects as a Project Leader or Individual Contributor. Let’s chat!

LinkedIn: https://www.linkedin.com/in/davidrichards5/
Email: david.richards.tech (@) gmail.com