How do I make python load a big(2hours) wave-file and convert it's contents into a time-frequency array?

Question

I would like to access the array with something like array[5000][440] meaning 5000ms from the start and 440hz and it would give me a value of the frequency's amplitude at this very position.

I could not find something like that here, if there is, please point me to it.

I think this gets you most of the way there? http://stackoverflow.com/questions/3694918/how-to-extract-frequency-associated-with-fft-values-in-python — gravitron, Jan 17 '12 at 18:07
Well, not yet unfortunately. I am missing the link to what they offer and what I want. The first sample gets me for instance "(0.27440469538+0.908302073062j) * exp(2 pi i t * 0.263687742847)". What is that supposed to say? — Zurechtweiser, Jan 17 '12 at 18:19
To complete what gravitron posted : http://stackoverflow.com/questions/2063284/what-is-the-easiest-way-to-read-wav-files-using-python-summary. With those two resources, you should be able to do what you want. — LBarret, Jan 17 '12 at 19:30

score 2 · Accepted Answer · answered Jan 17 '12 at 22:51

2

You basically want a spectrogram. To get you started, go through your sound file in small chunks, where each chunk is, say, 1/10th of a second, and FFT each of these chunks. (Then, of course, to look up 5000ms and 440Hz, go to the FFT of the appropriate chunk.)

answered Jan 17 '12 at 22:51

tom10

67,082
10
127
137

Actually I want to detect voice. But first I need to find some patterns. If there is some voice detection available I could skip that. – Zurechtweiser Jan 17 '12 at 23:39
What do you mean by "detect voice"? Do you mean identify when there's a voice vs when there's background noise? – tom10 Jan 18 '12 at 02:27
Yes that's what I mean. Voice detection, not recognition. Only detecting. – Zurechtweiser Jan 18 '12 at 07:39
This is probably not too hard, but it depends entirely on the background (that is, everything that's not voice). If the background is relative silence, then it's easy, of course, and just look at the amplitude; if it's noise, look for structures in each FFT like broadened peaks or harmonic structure. Not all phonemes will be easy to identify as voice ("sh", "t", etc), but vowels and others probably won't be too bad; and, of course, your time slices will generally include mixes, but all-in-all this seems possible. Start by plotting the spectrograms to visualize signal vs background. – tom10 Jan 18 '12 at 15:12
1

Btw, matplotlib has a spectrogram function called "specgram". It's an easy place to start. (Btw, people often think that they'll just calculate without bothering to plot things out first. This *never* works with problems like this. Start with the plots.) – tom10 Jan 18 '12 at 17:32

score 0 · Answer 2 · answered Jan 17 '12 at 23:20

0

You're operating under a couple of misconceptions.

You can't get the frequency of a wave at a particular point in time. You need to select a window of time, including many points before and after the point of interest. The more points you include, the more resolution you'll have in your frequency breakdown. You'll need to run some sort of windowing function on those points, then subject them to a FFT.

Once you have the results of the FFT, the numbers will correspond to frequencies but it won't be a simple relationship. You don't have any control over the frequency corresponding to each output, that was already determined by the sampling frequency of your signal combined with the number of samples. I'm afraid I don't have the conversion formula at hand. Each frequency will have two components, a real and an imaginary, and the amplitude will be sqrt(r**2+i**2).

answered Jan 17 '12 at 23:20

Mark Ransom

299,747
42
398
622

what does sqrt(r**2+i**2) mean in python code? Do you talk about http://upload.wikimedia.org/wikipedia/de/math/d/f/4/df48cdb63516e0039cdeae87c9608c63.png ? Best, richart. – Zurechtweiser Jan 17 '12 at 23:38
@RichartBremer, yes that's exactly what I meant - the absolute value of a complex number. – Mark Ransom Jan 18 '12 at 00:12
alright then, your answer helped me a lot. My next goal is to do voice detection using the results I got. Is there a proven approach for that that you are aware of? – Zurechtweiser Jan 18 '12 at 00:31
@RichartBremer, voice detection is a *lot* more complicated than just frequency detection. I'm afraid I'm completely out of my depth on that one. – Mark Ransom Jan 18 '12 at 03:16

score 0 · Answer 3 · answered Jan 19 '12 at 07:38

You can convert times and frequencies on fly. You have to use __getitem__ and probably lru_cache to store some values for further usage.

Let say that fourier is something like this

class Fourier():
   def __init__(self,a=10):
      self.a=a
   def __getitem__(self, index): 
      #this is function that calculates and returns value of my_furier
      return self.a+index

t=Fourier()
print(t[12.4])

You can apply same thing for accessing time from Fourier. So you can create new time object that enables you picking any valid time and returns that time or use some kind of interpolation to return values that are not in table.

If you will not be able to store all values in ram, you can use shelve module from standard library to store and acess items from disk and you can apply interface whit interpolation on it if required.

How do I make python load a big(2hours) wave-file and convert it's contents into a time-frequency array?

3 Answers3