0

I am creating a ruby wrapper for the fftw3 library for the Scientific Ruby Foundation which uses nmatrix objects instead of regular ruby arrays.

I have a curious problem in returning the transformed array in that I am not sure how to do this so I can check the transform has been computed correctly against octave or (something like this) in my specs

I have an idea that I might be best to cast the output array out which is an fftw_complex type to a VALUE to pass it to the nmatrix object before returning but I am not sure whether I should be using a wisdom and getting the values from that with fftw.

Here is the method and the link to the spec output on travis-ci

static VALUE
fftw_r2c_one(VALUE self, VALUE nmatrix)
{
  VALUE cNMatrix = rb_define_class("NMatrix", rb_cObject);

  fftw_plan plan;

  VALUE shape = rb_funcall(nmatrix, rb_intern("shape"), 0);

  const int size = NUM2INT(rb_funcall(cNMatrix, rb_intern("size"), 1, shape));

  double* in = ALLOC_N(double, size);

  for (int i = 0; i < size; i++)
  {
    in[i] = NUM2DBL(rb_funcall(nmatrix, rb_intern("[]"), 1, INT2FIX(i)));
    printf("IN[%d]: in[%.2f] \n", i, in[i]);
  }
  fftw_complex* out = (fftw_complex *) fftw_malloc(sizeof(fftw_complex) * size + 1);

  plan = fftw_plan_dft_r2c(1,&size, in, out, FFTW_ESTIMATE);

  fftw_execute(plan);

  fftw_destroy_plan(plan);
  xfree(in);
  fftw_free(out);
  return nmatrix;
}

Feel free to clone the repo from github and have a play about, if you like.

Note: I am pretty new to fftw3 and have not used C (or ruby) much, before starting this project. I had got more used to java, python and javascript to date so haven't quite got my head around lower level concepts like memory management but am getting the with this project. Please bear that in mind in your answers, and try to see that they are clear for someone and who up to recently has mainly got used to an object orientated approach up to now by avoiding jargon (or taking care to point it out) as that would really help.

Thank you.

Magpie
  • 607
  • 2
  • 7
  • 26

2 Answers2

0

If you want to see any performance benefit from using FFTW then you'll need to re-factor this code so that plan generation is performed only once for a given FFT size, since plan generation is quite costly, while executing the plan is where the performance gains come from.

You could either

  • a) have two entry points - an initialisation routine which generates the plan and then a main entry point which executes the plan

  • b) use a memorization technique so that you only generate the plan once, the first time you are called for a given FFT dimension, and then you cache the plan for subsequent re-use.

The advantage of b) is that it is a cleaner implementation with a single entry point; the disadvantage being that it breaks if you call the function with dimensions that change frequently.

Magpie
  • 607
  • 2
  • 7
  • 26
Paul R
  • 208,748
  • 37
  • 389
  • 560
  • Please let me know if there is some way I can improve my question to make the intention clearer. This is interesting, but does not seem to answer my question. Thanks nonetheless! – Magpie Aug 22 '14 at 18:51
  • 1
    Sorry - you're right - it does not address the specific problem of returning a transposed array, however the question is moot as it stands, since the current implementation will be so inefficient as to not be worthwhile, unless you refactor it as I explained above. – Paul R Aug 22 '14 at 19:22
  • hmmm, interesting. Can you elaborate or reference what you mean? I am not sure I fully understand your referral to the performance, rightly. I've mainly been focusing on core functionality so for now at least, my priority is just to get things working but with that said, I am interested in what you're saying as I do ultimately want the library to perform well (once it does work properly, that is). – Magpie Aug 22 '14 at 22:03
  • OK - in a nutshell, FFTW works like this: creating a plan is quite expensive - it can take *seconds* to do this if you go for maximum optimisation with the "exhaustive" flag, and even with the less demanding flags it still takes way longer than the FFT itself. So, given that you typically want to calculate a number of similar FFTs, the general idea is that you generate the plan only once, e.g. during an initialisation phase, and then re-use that same plan every time you perform an FFT. I hope that helps to clarify my suggestions in the answer above. – Paul R Aug 23 '14 at 05:48
0

I got some advice from Colin Fuller and after some pointers from him I came up with this solution:

VALUE fftw_complex_to_nm_complex(fftw_complex* in) {
    double real = ((double (*)) in)[1];
    double imag = ((double (*)) in)[2];
    VALUE mKernel = rb_define_module("Kernel");
    return rb_funcall(mKernel,
                      rb_intern("Complex"),
                      2,
                      rb_float_new(real),
                      rb_float_new(imag));
}

/**
  fftw_r2c
  @param self
  @param nmatrix
  @return nmatrix

  With FFTW_ESTIMATE as a flag in the plan,
  the input and and output are not overwritten at runtime
  The plan will use a heuristic approach to picking plans
  rather than take measurements
*/
static VALUE
fftw_r2c_one(VALUE self, VALUE nmatrix)
{
 /**
  Define and initialise the NMatrix class:
  The initialisation rb_define_class will
  just retrieve the NMatrix class that already exists
  or define a new class altogether if it does not
  find NMatrix. */
  VALUE cNMatrix = rb_define_class("NMatrix", rb_cObject);

  fftw_plan plan;

  const int rank = rb_iv_set(self, "@rank", 1);


  // shape is a ruby array, e.g. [2, 2] for a 2x2 matrix
  VALUE shape = rb_funcall(nmatrix, rb_intern("shape"), 0);
  // size is the number of elements stored for a matrix with dimensions = shape
  const int size = NUM2INT(rb_funcall(cNMatrix, rb_intern("size"), 1, shape));

  double* in = ALLOC_N(double, size);
  fftw_complex* out = (fftw_complex *) fftw_malloc(sizeof(fftw_complex) * size * size);

  for (int i = 0; i < size; i++)
  {
    in[i] = NUM2DBL(rb_funcall(nmatrix, rb_intern("[]"), 1, INT2FIX(i)));;
  }

  plan = fftw_plan_dft_r2c(1,&size, in, out, FFTW_ESTIMATE);
  fftw_execute(plan);

  for (int i = 0; i < 2; i++)
  {
    rb_funcall(nmatrix, rb_intern("[]="), 2, INT2FIX(i), fftw_complex_to_nm_complex(out + i));
  }

  // INFO: http://www.fftw.org/doc/New_002darray-Execute-Functions.html#New_002darray-Execute-Functions
  fftw_destroy_plan(plan);

  xfree(in);
  fftw_free(out);
  return nmatrix;
}

The only problem which remains it getting the specs to recognise the output types which I am looking at solving in the ruby core Complex API

Magpie
  • 607
  • 2
  • 7
  • 26