0

I have sound sample of saying one word, for example "Apple". Then I have longer audio file ~30 minutes, I want to find when in this longer audio file I say 'apple'. For now I have two ideas, first use sound recognize find this by text (but google/azure speech service have limits for free use). Second idea was to use fourier transformation to find some similarities, I will divide this longer audio to smaller samples. Do you have any idea how can I do it ?

piotrek
  • 1,333
  • 4
  • 17
  • 35
  • @RuudHelderman I think not :-| These speech recognition apis are not free for unlimited use. Also I don't use english there :-) – piotrek Nov 05 '21 at 22:30
  • The answer may not be helpful, but I'm afraid that is [no excuse to ask the same question again](https://meta.stackoverflow.com/q/330546). I find the question interesting, but it may be _too broad_. You are encouraged to make an attempt following either approach, then ask a _specific_ question about any hurdles you encounter. If you have any language demands, then please make this part of the question. – Ruud Helderman Nov 05 '21 at 22:47
  • 1
    @RuudHelderman I agree the question should be more focused / improved, but disagree with your reasoning. If the OP wishes for non-proprietary method (fair enough!), it's a good enough reason for having this question. That being said, I'd expect that OP translates some of his ideas into code and we go from there. Currently it's too broad for a programming question. Text-to-speech is indeed one way, the other could be audio fingerprinting. Check e.g. https://github.com/Uberi/speech_recognition with CMU Sphinx – Lukasz Tracewski Nov 06 '21 at 11:55
  • 1
    see https://stackoverflow.com/questions/49409440/how-to-find-what-time-a-part-of-audio-starts-and-ends-in-another-audio – Scott Stensland Nov 06 '21 at 15:01

0 Answers0