In an interactive (ipython) session, I think the following is what you want to do (timeit is very nicely handled by ipython):
In [1]: import numpy as np
In [2]: import pyfftw
In [3]: K = 2**16
In [4]: Llx = 10.
In [5]: KT = 2*K
In [6]: dx = Llx/np.float64(K)
In [7]: X = np.arange(-Llx,Llx,dx)
In [8]: a = pyfftw.n_byte_align_empty(KT, 16, 'complex128')
In [9]: b = pyfftw.n_byte_align_empty(KT, 16, 'complex128')
In [10]: fft_object = pyfftw.FFTW(a,b)
In [11]: a[:] = 2.*np.cosh(X)**(-2)
In [12]: timeit np.fft.fft(a)
100 loops, best of 3: 4.96 ms per loop
In [13]: timeit fft_object(a)
100 loops, best of 3: 1.56 ms per loop
In [14]: np.allclose(fft_object(a), np.fft.fft(a))
Out[14]: True
Have you read the tutorial? What don't you understand?
I would recommend using the builders interface to construct the FFTW object. Have a play with the various settings, most importantly the number of threads.
The wisdom is not stored by default. You need to extract it yourself.
All your globals
are unnecessary - the objects you want to change are mutable, so you can handle them just fine. fft_object
always points to the same thing, so no problem with that not being a global. Ideally, you simply don't want that loop over ii
. I suggest working out how to structure your arrays in order that you can do all your operations in a single call
Edit:
[edit edit: I wrote the following paragraph with only a cursory glance at your code, and clearly with it being a recursive update, vectorising is not an obvious approach without some serious cunning. I have a few comments on your implementation at the bottom though]
I suspect your problem is a more fundamental misunderstanding of how to best use a language like Python (or indeed Matlab) for numerical processing. The core tenet is vectorise as much as possible. By this, I mean roll up your python calls to be as few as possible. I can't see how to do that with your example unfortunately (though I've only thought about it for 2 mins). If that's still failing, think about cython - though make sure you really want to go down that route (i.e. you've exhausted the other options).
Regarding the globals: Don't do it that way. If you want to create an object with state, use a class (that is what they are for) or perhaps a closure in your case. The global is almost never what you want (I think I have one at least vaguely legit use for it in all my writing of python, and that's in the cache code in pyfftw). I suggest reading this nice SO question. Matlab is a crappy language - one of the many reasons for this is its crap scoping facilities which tend to lead to bad habits.
You only need global if you want to modify a reference globally. I suggest reading a bit more about the Python scoping rules and what variables really are in python.
FFTW
objects carry with them all the arrays you need so you don't need to pass them around separately. Using the call interface carries almost no overhead (particularly if you disable the normalisation) either for setting or returning the values - if you're at that level of optimisation, I strongly suspect you've hit the limit (I'd caveat this that this may not quite be true for many many very small FFTs, but at this point you want to rethink your algorithm to vectorise the calls to FFTW). If you find a substantial overhead in updating the arrays every time (using the call interface), this is a bug and you should submit it as such (and I'd be pretty surprised).
Bottom line, don't worry about updating the arrays on every call. This is almost certainly not your bottleneck, though make sure you're aware of the normalisation and disable it if you wish (it might slow things down slightly compared to raw accessing of the update_arrays()
and execute()
methods).
Your code makes no use of the cache. The cache is only used when you're using the interfaces
code, and reduces the Python overhead in creating new FFTW objects internally. Since you're handling the FFTW object yourself, there is no reason for a cache.
The builders
code is a less constrained interface to get an FFTW object. I almost always use the builders now (it's much more convenient that creating a FFTW object from scratch). The cases in which you want to create an FFTW object directly are pretty rare and I'd be interested to know what they are.
Comments on the algorithm implementation:
I'm not familiar with the algorithm you're implementing. However, I have a few comments on how you've written it at the moment.
You're computing nl_eval(wp)
on every loop, but as far as I can tell that's just the same as nl_eval(w)
from the previous loop, so you don't need to compute it twice (but this comes with the caveat that it's pretty hard to see what's going on when you have globals everywhere, so I might be missing something).
Don't bother with the copies in my_fft
or my_ifft
. Simply do fft_object(u)
(2.29 ms versus 1.67 ms on my machine for the forward case). The internal array update routine makes the copy unnecessary. Also, as you've written it, you're copying twice: c[:]
means "copy into the array c
", and the array you're copying into c
is v.copy()
, i.e. a copy of v
(so two copies in total).
More sensible (and probably necessary) is copying the output into holding arrays (since that avoids clobbering interim results on calls to the FFTW object), though make sure your holding arrays are properly aligned. I'm sure you've noted this is important but it's rather more understandable to copy the output.
You can move all your scalings together. The 3 in the computation of wn
can be be moved inside my_fft
in nl_eval
. You can also combine this with the normalisation constant from the ifft (and turn it off in pyfftw).
Take a look at numexpr for the basic array operations. It can offer quite a bit of speed-up over vanilla numpy.
Anyway take what you will from all that. No doubt I've missed something or said something incorrect, so please accept it with as much humility as I can offer. It's worth spending a little time working out how Python ticks compared to Matlab (in fact, just forget the latter).