4

I have an implementation of a Inverse Discrete Cosine Transform and I'm trying to figure out how they got to this code. So far, I've figured out that this is probably an optimized implementation of the Cooley-Tukey radix-2 Decimation-in-time for a DCT instead of a DFT (Discrete Fourier Transform).

However, I'm still at a loss about exactly what happens at each stage. I figured that the Wx constants are probably the twiddle factors.

Can anybody provide a reference to an explanation, or provide some explanation to this code?

//Twiddle factors
#define W1 2841 /* 2048*sqrt(2)*cos(1*pi/16) */
#define W2 2676 /* 2048*sqrt(2)*cos(2*pi/16) */
#define W3 2408 /* 2048*sqrt(2)*cos(3*pi/16) */
#define W5 1609 /* 2048*sqrt(2)*cos(5*pi/16) */
#define W6 1108 /* 2048*sqrt(2)*cos(6*pi/16) */
#define W7 565 /* 2048*sqrt(2)*cos(7*pi/16) */

//Discrete Cosine Transform on a row of 8 DCT coefficients.
NJ_INLINE void njRowIDCT(int* blk) {
    int x0, x1, x2, x3, x4, x5, x6, x7, x8;
    int t;
    if (!((x1 = blk[4] << 11)
        | (x2 = blk[6])
        | (x3 = blk[2])
        | (x4 = blk[1])
        | (x5 = blk[7])
        | (x6 = blk[5])
        | (x7 = blk[3])))
    {
        blk[0] = blk[1] = blk[2] = blk[3] = blk[4] = blk[5] = blk[6] = blk[7] = blk[0] << 3;
        return;
    }
    x0 = (blk[0] << 11) + 128;  //For rounding at fourth stage

    //First stage
    /*What exactly are we doing here? Do the x values have a meaning?*/
    x8 = W7 * (x4 + x5);
    x4 = x8 + (W1 - W7) * x4;
    x5 = x8 - (W1 + W7) * x5;
    x8 = W3 * (x6 + x7);
    x6 = x8 - (W3 - W5) * x6;
    x7 = x8 - (W3 + W5) * x7;

    //Second stage
    x8 = x0 + x1;
    x0 -= x1;
    x1 = W6 * (x3 + x2);
    x2 = x1 - (W2 + W6) * x2;
    x3 = x1 + (W2 - W6) * x3;
    x1 = x4 + x6;
    x4 -= x6;
    x6 = x5 + x7;
    x5 -= x7;

    //Third stage
    x7 = x8 + x3;
    x8 -= x3;
    x3 = x0 + x2;
    x0 -= x2;
    x2 = (181 * (x4 + x5) + 128) >> 8;
    x4 = (181 * (x4 - x5) + 128) >> 8;

    //Fourth stage
    blk[0] = (x7 + x1) >> 8;  //bit shift is to emulate 8 bit fixed point precision
    blk[1] = (x3 + x2) >> 8;
    blk[2] = (x0 + x4) >> 8;
    blk[3] = (x8 + x6) >> 8;
    blk[4] = (x8 - x6) >> 8;
    blk[5] = (x0 - x4) >> 8;
    blk[6] = (x3 - x2) >> 8;
    blk[7] = (x7 - x1) >> 8;

}
Mysticial
  • 464,885
  • 45
  • 335
  • 332
Tony The Lion
  • 61,704
  • 67
  • 242
  • 415

2 Answers2

4

I'm not an expert at DCTs but I have written a few FFT implementations in my time so I'm going to take a stab at answering this. Please take the following with a pinch of salt.

void njRowIDCT(int* blk)

You correctly say that the algorithm appears to be an 8-length Radix-2 DCT that uses fixed point arithmetic with 24:8 precision. I'm guessing the precision because the last stage right shifts by 8 to get the desired (that and the tell tale comment ;)

Because its 8-length, its power is 3 (2^3 = 8) meaning there are 3 stages in the DCT. So far this is all very similar to FFTs. The "fourth stage" seems to be just a scaling to recover the original precision after fixed point arithmetic.

As far as I can see the input stage is the bit-reversal from input array blk to local variables x0-x7. x8 seems to be a temporary variable. Sorry I can't be more descriptive than that.

Bit reversal stage

Bit Reversal of input

Update

Take a look at DSP For Scientists and Engineers. It gives a clear and precise explanation of signal processing topics. This chapter is on the DCT (please skip to p497).

The Wn (twiddle factors) correspond to the Basis Functions in this chapter, although note this is describing an 8x8 (2D) DCT.

With regard to 3 stages that I mentioned, compare to the description of an 8 point FFT:

8ptFFTGraph

The FFT is performing butterflies on the bit-reversed input array (which are essentially complex multiply-adds), multiplying one path by the Wn or twiddle factor along the way. The FFT is performed in stages. I still haven't worked out what your DCT code is doing but decomposing it into a diagram like this may help.

That or someone who knows what they're talking about step up ;-)

I'll revisit this page and edit as I decipher more code

Community
  • 1
  • 1
Dr. Andrew Burnett-Thompson
  • 20,980
  • 8
  • 88
  • 178
  • Thanks for the +1, although it is an incomplete and slightly sideways answer at this time. I'd appreciate feedback in comments - if you know something to add to this, I'm not here for the glory, be happy to +1 your answers – Dr. Andrew Burnett-Thompson Jan 05 '12 at 06:41
  • 1
    thanks so much for your effort so far, this definitely helps :) If you do find any more out, please add to answer – Tony The Lion Jan 05 '12 at 09:55
  • I think the diagram you have here is actually reversed with respect to the code. In the code, the "deeper" twiddle factors are done in stage 1, whereas the diagram has them in stage 3. – Mysticial Jan 05 '12 at 10:00
  • And as far as the bit-reversals go, it doesn't appear like the output of the code is either in-order or bit-reversed. It looks like it's some sort of reflection/rotation... though I can't see the motivation for it. – Mysticial Jan 05 '12 at 10:02
  • 1
    @TonyTheLion, will do! Regarding bit reversal, if you look at the index of the blk array to x0-x7 it does match this diagram: http://i.stack.imgur.com/ZizHD.png and the indices are bit reversed. The second diagram I posted is an FFT diagram NOT DCT but I posted as there may be information in there that is helpful (ie: stages, twiddle factor manipulation) – Dr. Andrew Burnett-Thompson Jan 05 '12 at 10:21
  • @Mystical, yes the second diagram is FFT NOT DCT. The twiddle factors seem to be applied at stages 1, 2 in the DCT. Why? Not sure. Hmmm ... – Dr. Andrew Burnett-Thompson Jan 05 '12 at 10:22
  • @Dr.AndrewBurnett-Thompson would it help if I posted the code for the ColIDCT too? – Tony The Lion Jan 05 '12 at 10:52
  • @TonyTheLion sure the more the merrier. Have you also tried math.stackexchange.com? It's not programming based but someone might know more about the algorithm. Regards, – Dr. Andrew Burnett-Thompson Jan 05 '12 at 11:25
1

Both the DFT and the DCT are just linear transforms, which can be represented as a single complex matrix multiply (sometime pruned for strictly real input). So you could just combine the above equations to get the formula for each final term, which should end up equivalent to one row of the linear transform matrix (ignoring rounding issues). Then see how the above code sequence is manually doing a common subexpression optimization or refactoring between and/or within row calculations.

hotpaw2
  • 70,107
  • 14
  • 90
  • 153