2

I have a bunch human reading simple sentence (hello world) as a wav file, How can I break the wav file for 2 wav files each contains word (hello and world) by automatically recognizing the gap between the words? Unfortunately I was unable to find tool to do it for me, so I will write C code that do that, As for my understanging, the gaps should be low numeric values in the wav file, is that correct? I know how to break the files, I Will glad to get approach for the gap recognition problem. Thank you!

Douglas B. Staple
  • 10,510
  • 8
  • 31
  • 58
JavaSheriff
  • 7,074
  • 20
  • 89
  • 159

3 Answers3

3

http://digitalcardboard.com/blog/2009/08/25/the-sox-of-silence/

I am sure this is the link you need.

 sox in.wav out.wav silence 1 0.5 1% 1 5.0 1% : newfile : restart

SoX will split audio when it detects 5 or more seconds of silence. You’ll end up with output files named out001.wav, out002.wav, and so on.

Yash
  • 5,225
  • 4
  • 32
  • 65
2

The way I approach this kind of task is by breaking the wav file into blocks of, say, 0.05 seconds each, computing the RMS amplitude of each block, and comparing the RMS amp to a threshold. If the recording is done under carefully controlled conditions, and the volume of speech relatively well normalized, the threshold may be a static value, but another way to do it is dynamically, checking for a block that is substantially louder than the previous block. You then consider the over-threshold block to be the start of a word.

However, in casual speech, there may not be much of a pause between words. If I say "helloworld" to you without a pause, you can understand me easily.

RMS amplitude is defined as the square root of the average-over-time of the squares of the individual samples.

Russell Borogove
  • 18,516
  • 4
  • 43
  • 50
  • So will pasudo code will be like this?
    `SAMPLE_SIZE = 0.05; for(int idxFile=0;idxFile
    – JavaSheriff Oct 22 '11 at 00:46
  • 1
    No new line at comments, unbelivable... [link](http://meta.stackexchange.com/questions/197/how-about-newlines-within-comments) – JavaSheriff Oct 22 '11 at 00:54
  • Samples are not seconds. Samples are not bytes. Otherwise that's VERY roughly the idea. – Russell Borogove Oct 22 '11 at 01:15
  • Sorry for my ignorance, how do i determine the Sample size? is this info the WAV header? – JavaSheriff Oct 24 '11 at 02:36
  • 1
    Yes. If you're not already familiar with audio signal processing work, you may be out of your depth. At a high level, what do you want to accomplish with this? – Russell Borogove Oct 24 '11 at 17:09
  • Thanks for your patience, It’s just a simple task (one time task) of breaking WAV files for software I am writing, I use to break these file with software (audacity) until I got tired of that, so I am trying to break the files using c code (but this is not so easy as I expected…) – JavaSheriff Oct 24 '11 at 19:12
1

See this answer about note onset detection (detecting the start and end of musical notes in a WAV file is exactly the same problem as detecting the start and end of spoken words in a WAV file).

Please note, however, that the task you've set for yourself is essentially impossible without extremely sophisticated (and not yet in existence) artificial intelligence. When a person speaks in a recording, there usually are not gaps between individual words that are numerically any different from the gaps between individual syllables within multi-syllabic words.

Community
  • 1
  • 1
MusiGenesis
  • 74,184
  • 40
  • 190
  • 334