Azure speech to text read audio

3/18/2023

Microsoft.Authorization/roleAssignments/readĪccess Review Operator Service Role, Cognitive Services Language Owner, Cognitive Services Language Reader 12, Cognitive Services Language Writer, Cognitive Services LUIS Owner, Cognitive Services LUIS Reader, Cognitive Services LUIS Writer, Cognitive Services OpenAI Contributor, Cognitive Services OpenAI User, Cognitive Services QnA Maker Editor, Cognitive Services QnA Maker Reader, Cognitive Services Speech Contributor, Elastic SAN Reader, Elastic SAN Volume Group Owner, Reservation Purchaser NotDataActions: 'add Microsoft.CognitiveServices/accounts/CustomVoice/trainingsets/files/read add Microsoft.CognitiveServices/accounts/CustomVoice/datasets/files/read add Microsoft.CognitiveServices/accounts/CustomVoice/trainingsets/utterances/read' Old Description: 'This is a role that can create, read, change and delete batch transcriptions, do real time transcriptions and list or get other speech resources.',ĭataActions: 'add Microsoft.CognitiveServices/accounts/SpeechServices/text-dependent/*/action add Microsoft.CognitiveServices/accounts/SpeechServices/text-independent/*/action add Microsoft.CognitiveServices/accounts/CustomVoice/*/read add Microsoft.CognitiveServices/accounts/CustomVoice/evaluations/* add Microsoft.CognitiveServices/accounts/CustomVoice/longaudiosynthesis/*', New Description: 'Access to the real-time speech recognition and batch transcription APIs, real-time speech synthesis and long audio APIs, as well as to read the data/test/model/endpoint for custom models, but can't create, delete or modify the data/test/model/endpoint for custom models.' NotDataActions: 'remove Microsoft.CognitiveServices/accounts/CustomVoice/trainingsets/files/read remove Microsoft.CognitiveServices/accounts/CustomVoice/trainingsets/utterances/read add Microsoft.CognitiveServices/accounts/CustomVoice/datasets/utterances/read'Īctions: 'add Microsoft.CognitiveServices/*/read'Ĭhange: Description, DataActions, NotDataActions Note that the second version with in-memory streams is a bit slower that the version with the temporary file.Īlso the second version hardcodes the bitrate of the generated WAV file, while the first version lets the recognizer to detect it.Access to the real-time speech recognition and batch transcription APIs, real-time speech synthesis and long audio APIs, as well as to read the data/test/model/endpoint for custom models, but can't create, delete or modify the data/test/model/endpoint for custom models.Īctions: 'add Microsoft.Authorization/roleAssignments/read add Microsoft.Authorization/roleDefinitions/read'ĭataActions: 'add Microsoft.CognitiveServices/accounts/AudioContentCreation/*', String tempFileName = Path.GetTempFileName() Ĭonsole.WriteLine($"Converting ") Ĭonsole.WriteLine("Application finished.") Using Ĭonsole.WriteLine("Application started.") The MP3 file is converted to WAV saving to a temporary file, that is deleted at the end of the processing. So I've chosen to adopt NAudio, that is a popular NuGet package available for. I wanted instead a solution that doesn't require to install any software. For a customer, I needed to create a POC where we wanted to run the Azure Speech service Speech-to-Text against the Mozilla Common Voice project, that is a collection of MP3 files reproducing people speaking in different languages.Ĭurrently Speech-to-Text can't work with MP3 files the suggested solution is to install GStreamer and invoke it as described in the official documentation.

0 Comments

BLOG

Azure speech to text read audio

Leave a Reply.

Author

Archives

Categories