3

Can we have loop unrolling in Microblaze C programming using EDK?

This is required because I need more performance. Traditionally my C code will run serially, so having loop unrolling using some compiler directive can accelerate my application.
(e.g as we do using openMP).

#pragma Unroll 
for (i = 0; i < 100; i++ ) {
    a[i] = fetch_data(i);
}

Is this possible for Microblaze? If yes is there any example on the same?

Paul S
  • 7,645
  • 2
  • 24
  • 36
gpuguy
  • 4,607
  • 17
  • 67
  • 125

2 Answers2

3

No, there isn't any automatic loop unrolling like that. For tight loops like that the common recommendation on the Xilinx forums is to manually unroll 10-20 times and see if the performance is acceptable or write the looping code in assembly.

You are typically losing 3 or 4 clock cycles on every one of the loop branches so depending on how long fetch_data takes to execute you could figure out how much unrolling you want to do.

for (i = 0; i < 100; i+=10 ) { 
    a[i] = fetch_data(i); 
    a[i+1] = fetch_data(i+1); 
    a[i+2] = fetch_data(i+2); 
    a[i+3] = fetch_data(i+3); 
    a[i+4] = fetch_data(i+4); 
    a[i+5] = fetch_data(i+5); 
    a[i+6] = fetch_data(i+6); 
    a[i+7] = fetch_data(i+7); 
    a[i+8] = fetch_data(i+8); 
    a[i+9] = fetch_data(i+9); 
} 

Make sure to heed the standard loop unrolling caveats like watching for interval sizes that aren't a multiple of your increment steps.

nvuono
  • 3,323
  • 26
  • 27
  • Thank you very much for giving details. So you mean this way (manual loop unrolling) I can save on calculations incurred in conditional statements inside for loop? Thats fine. But the issue is MB will be accessing multi-port memory controller (in my application), so I need a multiple threads, each accessing one port. How can I proceed to solve this? – gpuguy May 02 '12 at 11:46
  • nvuono's reply (although it doesn't talk about threading) is a great opportunity to introduce "Duff's Device" (http://en.wikipedia.org/wiki/Duff's_device). It's a clever (if abusive) way of getting the correct number of iterations, even when the total number of iterations doesn't divide cleanly into the number of times you've manually unrolled the loop. – Graeme Feb 14 '13 at 23:44
0

I got this reply from Xilinx(though I have not yet verified this):

http://gcc.gnu.org/onlinedocs/gcc-3.3.6/gcc/Optimize-Options.html

-funroll-loops section

The different -O optimization switchs (available directly in the SDK GUI) may do loop unrolling because it enables -floop-optimize which states:

-floop-optimize Perform loop optimizations: move constant expressions out of loops, simplify exit test conditions and optionally do strength-reduction and loop unrolling as well.

Enabled at levels-O,-O2,-O3,-Os.

gpuguy
  • 4,607
  • 17
  • 67
  • 125