2

What is the solution to detect specific words from audio file?

I have a lot of audio file (same codec) and each file is only about 15 seconds long. (Note: all audio files are the same person / same accent)

For example:

test1.mp3 play Hello Tom, what are you doing today?

test2.mp3 play Hello Paul, what are you doing today?

test3.mp3 play Good morning John - It is lovely weather today

I need a way to detect lovely weather or what are you words from each audio file.

I may have 100 audio files say "what are you doing today" and other files say "what are you doing today?" - I just need to know what what the status/type of each file are..

What is the solution to check frequency bits exist rather than using Voice Recognition tool.

halfer
  • 19,824
  • 17
  • 99
  • 186
I'll-Be-Back
  • 10,530
  • 37
  • 110
  • 213
  • Can you give us an idea of what your research has turned up so far? – halfer May 03 '12 at 15:15
  • 8
    `What is the solution to this?` - The solution is to not use PHP. It is not a good choice for this sort of thing, it requires a *lot* of low level mathematical operations and PHP is not very efficient for this. – DaveRandom May 03 '12 at 15:15
  • I agree with Dave. You might find some ideas to handle your problem here: http://stackoverflow.com/questions/23592/how-do-i-search-content-within-audio-files-streams – Deratrius May 03 '12 at 15:17
  • 3
    look for speech recognition programs/libraries/web services in other languages, then call them via the command line. You won't find speech recognition stuff directly in php. – goat May 03 '12 at 15:17
  • 3
    I should think by _using_ PHP, the OP means they are happy to use libraries/modules from PHP but not necessarily written in PHP `:)`. – halfer May 03 '12 at 15:19
  • speech recognition is not a good solution and it is too complex. Ok what is alternative to PHP? ... I may have like 20 audio files say "what are you doing today?" and 34 files say "It is lovely weather today" - I just need to detect what the status of each file so I can flag it to the database. – I'll-Be-Back May 03 '12 at 15:22
  • 2
    If you only need to do this with a couple of files, you might want to try the google speech api, here's a good article on how to use it with perl examples http://mikepultz.com/2011/03/accessing-google-speech-api-chrome-11/ I've actually done this with curl and php. You just need to POST the sound file in the correct format – marcelog May 03 '12 at 15:26
  • 3
    consider changing the title to speech recognition. – goat May 03 '12 at 15:27
  • PHP is the wrong choice for this task. – Nadh May 03 '12 at 15:28
  • @marcelog I have couples thousands of files :) All audio have identical same sentence and voice. I just a way to detect what the status of each file are. – I'll-Be-Back May 03 '12 at 15:29
  • @halfer I am looking for a solution to check if the needle frequency bits exist on the haystack frequency bits. It is simple as that. – I'll-Be-Back May 03 '12 at 15:44
  • @user791022 - not sure which of my comments you're responding to. I presume the second one, in which I was defending your use of the PHP tag? – halfer May 03 '12 at 15:48
  • No, no, it is **not** as simple as that – dan-lee May 03 '12 at 15:50
  • 1
    See http://stackoverflow.com/a/6351055/90236 – Michael Levy May 03 '12 at 16:41
  • Misleading title. Word detection/recognition is a different topic from frequency detecting. – hotpaw2 May 03 '12 at 17:10
  • possible duplicate of [Text-to-speech (voice generation) and speech-to-text (voice recognition) APIs?](http://stackoverflow.com/questions/6348770/text-to-speech-voice-generation-and-speech-to-text-voice-recognition-apis) – Ben May 03 '12 at 21:27
  • @user791022: if files have same sentence and voice, what's different about them? Are they identical files, or many instances of the same speaker saying the same thing? Or something else? – tom10 May 03 '12 at 21:27

1 Answers1

2

You are essentially asking "How can I do general purpose speech recognition"?

The solutions are:

If your platform provides speech-recognition out of the box, use that. Microsoft Windows does, for example. http://msdn.microsoft.com/en-us/library/hh323805.aspx

If your platform does not, then you need to integrate a third party speech recognition package, such as Lernaut & Hauspie (now Nuance), Dragon, etc. This will likely involve paying money.

Edit: I have flagged this as a duplicate of Text-to-speech (voice generation) and speech-to-text (voice recognition) APIs?, which has a comprehensive answer to "how can I do speech recognition".

Community
  • 1
  • 1
Ben
  • 34,935
  • 6
  • 74
  • 113