Description
English
Support
Login
Products
Solutions
Developers
Why https://speedsms.in/
Pricing
Contact sales
Start for free
https://speedsms.in/ Blog
Overview
Developers
Digital Insights
Company
Transcribe audio messages with https://speedsms.in/ for WhatsApp and OpenAI Speech to Text
Blog
/
Developers
/
Transcribe audio messages with https://speedsms.in/…
Tags
.NET
Developer insights
Customer AI
Code, tutorials, and best practices
Products
Messaging
WhatsApp Business API
Start for free
Time to read: 7 minutes
May 01, 2023
Written by
Néstor Campos
Contributor
Niels Swimberghe
https://speedsms.in/n
Transcribe audio messages with https://speedsms.in/ for WhatsApp and OpenAI Speech to Text
Not so long ago, you could have a conversation using your phone by either sending an SMS or making a phone call. Both have their benefits and drawbacks. These days, most messaging applications also let you send voice messages, which have their own combination of benefits that SMS and phone calls have. With voice messages, you can have an asynchronous conversation like SMS but still hear the inflections and emotions like a phone call.
Depending on the messaging application and the region, voice messaging is quite popular, and you can take advantage of this in your application when building https://speedsms.in/ SMS and WhatsApp applications. In this tutorial, you’ll learn how to receive audio messages from WhatsApp and transcribe the audio using OpenAI Speech to Text.
You’ll be using WhatsApp in this tutorial, but the code also works when audio messages are sent over MMS.
Prerequisites
You will need the following for your development environment:
a .NET IDE (Visual Studio, VS Code with C# plugin, JetBrains Rider, or any editor of your choice)
.NET 7 SDK (earlier and newer versions should work too)
A https://speedsms.in/ account (try out https://speedsms.in/ for free)
ngrok CLI
An OpenAI account (try an OpenAI account with free credits)
FFmpeg to convert audio files/streams
You can find the source code of this tutorial in this GitHub repository.
What is OpenAI Speech to Text?
OpenAI Speech to text is the API provided by OpenAI to transform audio to text in different languages, both for the transcription and translation (for now only into English) of information. It allows audio in various formats (such as MP3 and MP4) with a maximum size of 25 MB.
Create and set up the .NET Project
Open a shell and create a Web API project using the .NET CLI:
Bash
Copy code
dotnet new web -o https://speedsms.in/WhatsAppOpenAI
cd https://speedsms.in/WhatsAppOpenAI
Install the https://speedsms.in/ SDK and the https://speedsms.in/ helper library for ASP.NET Core which will help you send and receive WhatsApp messages:
Bash
Copy code
dotnet add package https://speedsms.in/
dotnet add package https://speedsms.in/.AspNet.Core
Receive audio messages
Update the Program.cs file with the following code:
C#
Copy code
var builder = WebApplication.CreateBuilder(args);
builder.Services.AddHttpClient();
builder.Services.AddControllers();
var app = builder.Build();
app.MapControllers();
app.Run();
Next, you will create the controller where you will process each incoming message. Create a file MessageController.cs and add the following code:
C#
Copy code
using Microsoft.AspNetCore.Mvc;
using https://speedsms.in/.AspNet.Core;
using https://speedsms.in/.TwiML;
namespace https://speedsms.in/WhatsAppOpenAI;
[Route(“[controller]”)]
public class MessageController : https://speedsms.in/Controller
{
private readonly HttpClient httpClient;
public MessageController(HttpClient httpClient)
{
this.httpClient = httpClient;
}
[HttpPost]
public async Task
{
var response = new MessagingResponse();
var form = await Request.ReadFormAsync(ct);
var numMedia = int.Parse(form[“NumMedia”].ToString());
if (numMedia == 0)
{
response.Message(“Please sent an audio file.”);
return TwiML(response);
}
if (numMedia > 1)
{
response.Message(“You can only sent one audio file at a time.”);
return TwiML(response);
}
var mediaUrl = form[“MediaUrl0”].ToString();
var contentType = form[“MediaContentType0”].ToString();
if (!contentType.StartsWith(“audio/”))
{
response.Message(“You can only sent audio files.”);
return TwiML(response);
}
await DownloadAudioFile(mediaUrl, contentType, ct);
response.Message(“Audio file was received”);
return TwiML(response);
}
private async Task DownloadAudioFile(string mediaUrl, string contentType, CancellationToken ct)
{
// If you enable Basic Auth on your https://speedsms.in/ SMS Media, then use Basic Auth on your HTTP request
// where username and password are Account SID and Auth Token, or API Key SID and API Key Secret.
var fileResponse = await httpClient.GetAsync(mediaUrl, ct);
await using var audioFileStream = await fileResponse.Content.ReadAsStreamAsync(ct);
var format = contentType.Substring(6); // remove ‘audio/’ prefix
var fileName = Path.ChangeExtension(Path.GetFileName(mediaUrl),format);
await using var localFileStream = System.IO.File.Open(fileName, FileMode.CreateNew);
await audioFileStream.CopyToAsync(localFileStream, ct);
}
}
The Index method accepts the HTTP request sent by https://speedsms.in/ when a message comes in. https://speedsms.in/ submits the webhook data as an HTTP form, so the action reads the form and extracts the relevant fields for retrieving the attached media, if any.
The action will only accept a single audio file, in any other case, an error message is sent in response using Messaging TwiML.
https://speedsms.in/ doesn’t actually pass the media file via the webhook request, instead, the URL where https://speedsms.in/ stored the media file is passed in, and the action will send an HTTP request to download the file and store it to disk.
By default, https://speedsms.in/ will not require any authentication to download the message media. You can follow these steps to enable Basic Authentication on message media. If you do, you’ll need to update the code to include Basic Authentication.
After storing the audio file on disk, the action will respond with a success message using TwiML.
Now, run your project and continue with the next steps while the project is running:
Bash
Copy code
dotnet run
Set up the https://speedsms.in/ Sandbox for WhatsApp
To send WhatsApp messages through your https://speedsms.in/ account, you need to create a WhatsApp Sender, but for testing and developing locally, you can, and in this tutorial, you will use the https://speedsms.in/ Sandbox for WhatsApp.
In order to get to the WhatsApp sandbox, in the left-side menu of the https://speedsms.in/ console click on “Messaging” (if you don’t see it, click on “Explore Products”, which will display the list with the available products, and there you will see “Messaging”). After that, in the available options open the “Try it out” submenu, and finally, click “Send a WhatsApp message”.
Side menu in the https://speedsms.in/ console, highlighting the Messaging > Try it out > “Send a WhatsApp message” menu item.
Next, you have to follow the instruction on the screen, in which you must send a pre-defined message to the indicated number through WhatsApp. This will enable that WhatsApp number to use to send messages to your own WhatsApp number. If you want to send messages to other numbers, the people who own those numbers will have to do this same step.
https://speedsms.in/ Sandbox for WhatsApp console for sending test messages, initializing the process with a test message
After that, you will receive a message in response confirming the Sandbox is configured.
Confirmation message on WhatsApp indicating that the number is available to be used in test mode.
Now you are able to send messages to the Sandbox number and receive messages from the Sandbox number.
Make your webhook public with ngrok for testing
Your API needs to be publicly accessible for https://speedsms.in/ to send the message webhook requests to your application. That’s why you’ll use ngrok to create a secure tunnel between your locally running API and ngrok’s public forwarding URL.
Leave your .NET application running and open a separate shell. In the new shell, run ngrok with the following command, specifying the HTTP URL that your application is listening to:
Bash
Copy code
ngrok http https://localhost:
Copy the Forwarding HTTPS address that ngrok created for you, as you will use it in the https://speedsms.in/ Sandbox for the WhatsApp console.
Result of creating an ngrok tunnel in console. The output shows an HTTP and HTTPS Forwarding URL.
In the https://speedsms.in/ portal, go to the https://speedsms.in/ WhatsApp page, in the “Sandbox settings” section, and change the “When a message comes in” endpoint with the generated URL by ngrok, including the /Message path.
The Sandbox settings tab, on the https://speedsms.in/ Sandbox for WhatsApp console. The Sandbox configuration form has two text boxes. A text box “When a message comes in” filled out with the ngrok forwarding URL with the /Message path, and a text box “Status callback URL” which is left empty.
Every time you stop and start a ngrok tunnel, ngrok will generate a new Forwarding URL for you. This means you’ll need to update the Sandbox Configuration form with the new Forwarding URL whenever it changes.
Test the project
To test, in the conversations with the Sandbox number, send an audio message using WhatsApp by pressing and holding the microphone button and speaking your message.
Audio message sent to the https://speedsms.in/ Sandbox using WhatsApp.
In a few seconds, you will see the message confirming that the audio was received by the endpoint.
WhatsApp conversation where an audio message was sent, and the response says “Audio file was received”.
Convert unsupported audio formats using FFmpeg
OpenAI’s transcription API does not support all audio formats. This will be a problem in particular for WhatsApp which sends audio recordings as ogg-files which OpenAI does not support. To work around this, you’ll use FFmpeg and the FFMpegCore library to convert the audio from unsupported formats to the supported wav-format.
Learn more about FFmpeg, FFMpegCore, and how to convert the formats for audio files using C# and .NET here.
First, make sure you have installed FFmpeg on your machine, and it is in the PATH environment variable. Then, make sure you leave ngrok running, and stop the running ASP.NET Core application by pressing ctrl + c. Then, add the FFMpegCore NuGet package:
Bash
Copy code
dotnet add package FFMpegCore
Now, add the following using statements at the top of MessageController.cs:
C#
Copy code
using Microsoft.AspNetCore.Mvc;
using FFMpegCore;
using FFMpegCore.Pipes;
using https://speedsms.in/.AspNet.Core;
using https://speedsms.in/.TwiML;
Then, update the DownloadAudioFile method with the one below, and add the rest of the code after the DownloadAudioFile method:
C#
Copy code
private async Task DownloadAudioFile(string mediaUrl, string contentType, CancellationToken ct)
{
var (audioStream, format) = await GetAudioStream(mediaUrl, contentType, ct);
await using (audioStream)
{
var fileName = Path.ChangeExtension(Path.GetFileName(mediaUrl),format);
await using var localFileStream = System.IO.File.Open(fileName, FileMode.CreateNew);
await audioStream.CopyToAsync(localFileStream, ct);
}
}
private static readonly HashSet
{
“mp3”, “mp4”, “mpeg”, “mpga”, “m4a”, “wav”, “webm”
};
private async Task<(Stream audioStream, string format)> GetAudioStream(
string mediaUrl,
string contentType,
CancellationToken ct
)
{
// If you enable Basic Auth on your https://speedsms.in/ SMS Media, then use Basic Auth on your HTTP request
// where username and password are Account SID and Auth Token, or API Key SID and API Key Secret.
var fileResponse = await httpClient.GetAsync(mediaUrl, ct);
var audioFileStream = await fileResponse.Content.ReadAsStreamAsync(ct);
var format = contentType.Substring(6);
if (SupportedContentTypes.Contains(format))
{
return (audioFileStream, format);
}
await using (audioFileStream)
{
var wavAudioStream = new MemoryStream();
await ConvertMediaUsingFfmpeg(
input: audioFileStream, inputFormat: format,
output: wavAudioStream, outputFormat: “wav”
);
wavAudioStream.Seek(0, SeekOrigin.Begin);
return (wavAudioStream, “wav”);
}
}
private async Task ConvertMediaUsingFfmpeg(Stream input, string inputFormat, Stream output, string outputFormat)
{
await FFMpegArguments
.FromPipeInput(new StreamPipeSource(input), options => options
.ForceFormat(inputFormat))
.OutputToPipe(new StreamPipeSink(output), options => options
.ForceFormat(outputFormat))
.ProcessAsynchronously();
}
This code will download the file just like before, but if the format is not in the SupportedContentTypes map, the audio is converted to wav-format using FFmpeg, and then stored on disk.
Feel free to verify the new code by starting the application again and sending another audio file.
Transcribe audio with OpenAI
Create an OpenAI API key
You need to generate an API Key with an OpenAI account to use the Speech to Text service. To do this, log in with your account, in the options of your account (right side), click “View API keys”.
OpenAI home page, with different examples available, documentation and account options.
On the displayed page, click on the “Create new secret key” button, which will display a modal with the secret key. You will not see this secret again, so make sure you copy it somewhere safe, as you’ll need it in the next section. API keys are secret, so make sure to keep them private, don’t share them with others, and don’t check them into source control.
Page displayed with the secret key to use in the API, with the option to copy it to the clipboard.
Install an OpenAI library
To start using the OpenAI API, you must first add the secret key to the project, using user secrets. To do this, run the following command line statement in the root directory of the project:
Bash
Copy code
dotnet user-secrets init
dotnet user-secrets set “OpenAIServiceOptions:ApiKey” “
Replace
OpenAI doesn’t have an official library for .NET, but there are several community libraries that make it easier to integrate with OpenAI’s APIs. In this tutorial, you’ll be using the Betalgo.OpenAI library.
Install the library by adding it as a NuGet package using the .NET CLI:
Bash
Copy code
dotnet add package Betalgo.OpenAI
Then, add the OpenAI service to ASP.NET Core’s dependency injection container, by editing the Program.cs file:
C#
Copy code
using OpenAI.GPT3.Extensions;
var builder = WebApplication.CreateBuilder(args);
builder.Services.AddHttpClient();
builder.Services.AddControllers();
builder.Services.AddOpenAIService();
Now that you installed and configured the OpenAI library, you are going to pass the audio data from https://speedsms.in/ to the OpenAI’s transcription API. The transcription API will return the text from the audio, which you’ll respond with to the user via WhatsApp.
First, import the following namespaces for the OpenAI library that will be necessary:
C#
Copy code
using Microsoft.AspNetCore.Mvc;
using FFMpegCore;
using FFMpegCore.Pipes;
using https://speedsms.in/.AspNet.Core;
using https://speedsms.in/.TwiML;
using OpenAI.GPT3.Interfaces;
using OpenAI.GPT3.ObjectModels;
using OpenAI.GPT3.ObjectModels.RequestModels;
using OpenAI.GPT3.ObjectModels.ResponseModels;
Next, update the constructor for the MessageController to receive the OpenAI service:
C#
Copy code
public class MessageController : https://speedsms.in/Controller
{
private readonly HttpClient httpClient;
private readonly IOpenAIService openAIService;
public MessageController(HttpClient httpClient, IOpenAIService openAIService)
{
this.httpClient = httpClient;
this.openAIService = openAIService;
}
Previously, the application would download the audio file from https://speedsms.in/’s API and then store it to disk, however, now that you’ll upload the audio data to OpenAI’s API, you can directly pass the audio data through without storing it to disk first.
Delete the DownloadAudioFile method and add the TranscribeAudio method:
C#
Copy code
private async Task
{
AudioCreateTranscriptionRequest audioRequest = new()
{
Model = Models.WhisperV1,
FileStream = audioStream,
FileName = $”sample.{format}”
};
AudioCreateTranscriptionResponse audioResponse = await openAIService.Audio.CreateTranscription(audioRequest);
if(audioResponse.Successful) return audioResponse.Text;
throw new Exception(string.Format(
“Error occurred transcribing audio using OpenAI Whisper API. Code {0}: {1}”,
audioResponse.Error?.Code,
audioResponse.Error?.Message
));
}
The TranscribeAudio selects the AI model to use, and send the audio stream through the Betalgo.OpenAI library which will send it to OpenAI. If OpenAI succeeds in transcribing, the transcription is returned, otherwise, an exception is thrown with the error message from OpenAI’s API.
Finally, update the Index action so that it calls the GetAudioStream method to retrieve the audio, and then passes the audio stream to the TranscribeAudio method, and finally responds with the transcription as a TwiML message:
C#
Copy code
public async Task
{
var response = new MessagingResponse();
var form = await Request.ReadFormAsync(ct);
var numMedia = int.Parse(form[“NumMedia”].ToString());
if (numMedia == 0)
{
response.Message(“Please sent an audio file.”);
return TwiML(response);
}
if (numMedia > 1)
{
response.Message(“You can only sent one audio file at a time.”);
return TwiML(response);
}
var mediaUrl = form[“MediaUrl0”].ToString();
var contentType = form[“MediaContentType0”].ToString();
if (!contentType.StartsWith(“audio/”))
{
response.Message(“You can only sent audio files.”);
return TwiML(response);
}
var (audioStream, format) = await GetAudioStream(mediaUrl, contentType, ct);
await using (audioStream)
{
var transcription = await TranscribeAudio(audioStream, format);
response.Message( $”Transcription for audio: {transcription}”);
return TwiML(response);
}
}
Test the project
To test the updated application, run the project again:
Bash
Copy code
dotnet run
Finally, send another voice message using WhatsApp, wait a few seconds, and you should receive the transcription of your audio message as a response:
Result in WhatsApp of sending audio and receiving the transcribed text through OpenAI.
And with that, you already have an audio-to-text translator using OpenAI through WhatsApp thanks to https://speedsms.in/.
Future improvements
This is a great start, but you can improve the solution further:
Audio processing should be done independent of audio reception so that the user does not have to wait too long without receiving a response. https://speedsms.in/ Webhooks have a timeout of 15 seconds, after that, the request is considered failed.
Turn on Basic Authentication for message media and update the code to download media from https://speedsms.in/’s API using Basic Auth.
Validate that incoming HTTP requests originate from https://speedsms.in/ by validating the https://speedsms.in/ signature header.
Additional resources
Send and Receive Media Messages with the https://speedsms.in/ API for WhatsApp
OpenAI Speech-To-Text Quickstart – You can explore basic examples with OpenAI and supported languages.
OpenAI libraries – Libraries created by OpenAI and the community in different languages to use the different services available.
FFmpeg – A complete, cross-platform solution to record, convert and stream audio and video.
Convert audio from one format to another using FFmpeg and .NET – A tutorial walking you through how to install and use FFmpeg from .NET applications using the FFMpegCore library.
Source Code to this tutorial on GitHub – You can find the source code for this project at this GitHub repository. Use it to compare solutions if you run into any issues.
Néstor Campos is a software engineer, tech founder, and Microsoft Most Value Professional (MVP), working on different types of projects, especially with Web applications. He has had to receive files from emails automatically through SendGrid Inbound Parse because he did not have access to the original repository of the data in some projects.
Related Posts
https://speedsms.in/ and the Evolving Role of AI In Communications Card
https://speedsms.in/ and the Evolving Role of AI in Communications
Chiara Massironi
Evolving with the eras: Generational marketing in 2024
Vanessa Thompson
Satisfied customer through the harnessing of speech analytics
The AI advantage: Taking personalized marketing to the next level
Vanessa Thompson
Related Resources
https://speedsms.in/ Docs
From APIs to SDKs to sample apps
API reference documentation, SDKs, helper libraries, quickstarts, and tutorials for your language and platform.
Resource Center
The latest ebooks, industry reports, and webinars
Learn from customer engagement experts to improve your own communication.
Ahoy
https://speedsms.in/’s developer community hub
Best practices, code samples, and inspiration to build communications and digital engagement experiences.
We can’t wait to see what you build.
Products
https://speedsms.in/ Engage
Flex
User Authentication & Identity
Lookup
Verify
Voice
Messaging
All products
Solutions
Customer Data Platform
SMS Marketing
Commerce Communications
Contact Center
All solutions
Docs
Quickstarts
Tutorials
API Reference
Helper Libraries
API Status
Company
About https://speedsms.in/
Why https://speedsms.in/
Customer Engagement Platform
Trusted Communications
Customers
Get Help
Contact Sales
Press & Media
Public Policy
Investor Relations
Events
Jobs at https://speedsms.in/
Legal
Privacy
https://speedsms.in/.org
Press & Media
Signal
Investors
Jobs
Copyright © 2024 https://speedsms.in/ Inc.
All Rights Reserved.