12

I just want to know if there is any build in libraries or external libraries in Java or C# that allow me to take an audio file and parse it and extract the text from it.

I need to make an application to do so, but I don't know from where I can start.

Andrew Thompson
  • 168,117
  • 40
  • 217
  • 433
Amira Elsayed Ismail
  • 9,216
  • 30
  • 92
  • 175
  • definitely not built-in. I take it that you wish to extend your questions to libraries you can use from Java or C#. – Thilo Oct 18 '10 at 10:46
  • Thanks Mr. Thilo , ok if you know any external libraries of C# or Java that can do what i want i will be appreciated if you tell me , Thanks in Advance – Amira Elsayed Ismail Oct 18 '10 at 10:56
  • This might might help http://java.sun.com/products/java-media/speech/reference/codesamples/index.html – jmj Oct 18 '10 at 11:02
  • In c# you can use Speech API.refer this http://msdn.microsoft.com/en-us/library/ee125077%28v=VS.85%29.aspx – Vyasdev Meledath Oct 18 '10 at 11:14
  • Extract text from audio?! Are you for real??? – Cipi Oct 18 '10 at 11:15
  • 3
    @Cipi OCR extracts text from images and there's plenty of work being done to do the same for audio. Just check youtube for the computer generated closed-captioning. For the laughs, I mean. They're horrible, but so was OCR at the beginning. –  Oct 18 '10 at 12:14

5 Answers5

12

Here are some of your options:

Ohad Schneider
  • 36,600
  • 15
  • 168
  • 198
8

Here is a complete example using C# and System.Speech

The code can be divided into 2 main parts:

configuring the SpeechRecognitionEngine object (and its required elements) handling the SpeechRecognized and SpeechHypothesized events.

Step 1: Configuring the SpeechRecognitionEngine

_speechRecognitionEngine = new SpeechRecognitionEngine();
_speechRecognitionEngine.SetInputToDefaultAudioDevice();
_dictationGrammar = new DictationGrammar();
_speechRecognitionEngine.LoadGrammar(_dictationGrammar);
_speechRecognitionEngine.RecognizeAsync(RecognizeMode.Multiple);

At this point your object is ready to start transcribing audio from the microphone. You need to handle some events though, in order to actually get access to the results.

Step 2: Handling the SpeechRecognitionEngine Events

_speechRecognitionEngine.SpeechRecognized -= new EventHandler(SpeechRecognized); _speechRecognitionEngine.SpeechHypothesized -= new EventHandler(SpeechHypothesizing);

_speechRecognitionEngine.SpeechRecognized += new EventHandler(SpeechRecognized); _speechRecognitionEngine.SpeechHypothesized += new EventHandler(SpeechHypothesizing);

private void SpeechHypothesizing(object sender, SpeechHypothesizedEventArgs e) { ///real-time results from the engine string realTimeResults = e.Result.Text; }

private void SpeechRecognized(object sender, SpeechRecognizedEventArgs e) { ///final answer from the engine string finalAnswer = e.Result.Text; }

That’s it. If you want to use a pre-recorded .wav file instead of a microphone, you would use

_speechRecognitionEngine.SetInputToWaveFile(pathToTargetWavFile);

instead of

_speechRecognitionEngine.SetInputToDefaultAudioDevice();

There are a bunch of different options in these classes and they are worth exploring in more detail.

http://ellismis.com/2012/03/17/converting-or-transcribing-audio-to-text-using-c-and-net-system-speech/

bulltorious
  • 7,769
  • 4
  • 49
  • 78
1

You might check Microsoft Speech API. I think they provide a SDK that you can use for your objective.

jassuncao
  • 4,695
  • 3
  • 30
  • 35
1

For Java, it seems there is a solution from Sun: javax.speech.recognition

Grant Peters
  • 7,691
  • 3
  • 45
  • 57
1

You can use SoX (the Swiss Army knife of sound processing programs) to convert audio file to text file with numeric values corresponding to sound frequency/volume.

I have done it for a previous project but don't know the exact command options.

Here is a link to the project: http://sox.sourceforge.net/Main/HomePage

Ivelin
  • 12,293
  • 5
  • 37
  • 35