What is the practical difference between implementing FOR-LOOP and FOR-GENERATE? When is it better to use one over the other?

Question

Let's suppose I have to test different bits on an std_logic_vector. would it be better to implement one single process, that for-loops for each bit or to instantiate 'n' processes using for-generate on which each process tests one bit?

FOR-LOOP

my_process: process(clk, reset) begin
  if rising_edge (clk) then
    if reset = '1' then
      --init stuff
    else
      for_loop: for i in 0 to n loop
        test_array_bit(i);
      end loop;
    end if;      
  end if; 
end process;

FOR-GENERATE

for_generate: for i in 0 to n generate begin
my_process: process(clk, reset) begin
  if rising_edge (clk) then
    if reset = '1' then
      --init stuff
    else
      test_array_bit(i);
    end if;
  end if; 
end process;
end generate;

What would be the impact on FPGA and ASIC implementations for this cases? What is easy for the CAD tools to deal with?

EDIT: Just adding a response I gave to one helping guy, to make my question more clear:

For instance, when I ran a piece of code using for-loops on ISE, the synthesis summary gave me a fair result, taking a long while to compute everything. when I re-coded my design, this time using for-generate, and several processes, I used a bit more area, but the tool was able to compute everything way way faster and my timing result was better as well. So, does it imply on a rule, that is always better to use for-generates with a cost of extra area and lower complexity or is it one of the cases I have to verify every single implementation possibility?

score 6 · Accepted Answer · 2015-07-14T11:14:41.423

Assuming relatively simple logic in the reset and test functions (for example, no interactions between adjacent bits) I would have expected both to generate the same logic.

Understand that since the entire for loop is executed in a single clock cycle, synthesis will unroll it and generate a separate instance of test_array_bit for each input bit. Therefore it is quite possible for synthesis tools to generate identical logic for both versions - at least in this simple example.

And on that basis, I would (marginally) prefer the for ... loop version because it localises the program logic, whereas the "generate" version globalises it, placing it outside the process boilerplate. If you find the loop version slightly easier to read, then you will agree at some level.

However it doesn't pay to be dogmatic about style, and your experiment illustrates this : the loop synthesises to inferior hardware. Synthesis tools are complex and imperfect pieces of software, like highly optimising compilers, and share many of the same issues. Sometimes they miss an "obvious" optimisation, and sometimes they make a complex optimisation that (e.g. in software) runs slower because its increased size trashed the cache.

So it's preferable to write in the cleanest style where you can, but with some flexibility for working around tool limitations and occasionally real tool defects.

Different versions of the tools remove (and occasionally introduce) such defects. You may find that ISE's "use new parser" option (for pre-Spartan-6 parts) or Vivado or Synplicity get this right where ISE's older parser doesn't. (For example, passing signals out of procedures, older ISE versions had serious bugs).

It might be instructive to modify the example and see if synthesis can "get it right" (produce the same hardware) for the simplest case, and re-introduce complexity until you find which construct fails.

If you discover something concrete this way, it's worth reporting here (by answering your own question). Xilinx used to encourage reporting such defects via its Webcase system; eventually they were even fixed! They seem to have stopped that, however, in the last year or two.

score 0 · Answer 2 · answered Jul 13 '15 at 21:37

0

The first snippet would be equivalent to the following:

my_process: process(clk, reset) begin
  if rising_edge (clk) then
    if reset = '1' then
      --init stuff
    else
      test_array_bit(0);
      test_array_bit(1);
      ............
      test_array_bit(n);
    end if;      
  end if; 
end process;

While the second one will generate n+1 processes for each i, together with the reset logic and everything (which might be a problem as that logic will attempt to drive the same signals from different processes).
In general, the for loops are sequential statements, containing sequential statements (i.e. each iteration is sequenced to be executed after the previous one). The for-generate loops are concurrent statements, containing concurrent statements, and this is how you can use it to make several instances of a component, for example.

answered Jul 13 '15 at 21:37

Eugene Sh.

17,802
8
40
61

Hi Eugene, thanks a lot for replying. I am sorry, but I should have been more clear on my question. I know the basics of coding and the "logical" implications of using both structures. My doubt is on the practical result of using them. I used the example with one simple vector just as a show case, but in practice things are way more complicated. So, reformulating: if I have a piece of code that should give me a reply in one clock cycle (or run all its contents in parallel, if this suits better) what, by the physical lvl point of view, would be the best implementation strategy? – Felipe GM Jul 14 '15 at 01:30
For instance, when I run a piece of code using for-loops on ISE, the synthesis summary gave me a fair result, taking a long while to compute everything. when I re-coded my design, this time using for-generate, and several processes, I used a bit more area, but the tools was able to compute everything way way faster and my timing result was better as well. So, does it imply on a rule, that is always better to use for-generates with a cost of extra area or is it one of the cases I have to verify every single implementation possibility? – Felipe GM Jul 14 '15 at 01:42
The faster computation resulted from the parallel nature of the second implementation. And, as always, there is a trade-off with the number of resources used. If you look at the RTL schematic of both implementations you will see the difference. Personally I am using `generate` loops when coding RTL logic, i.e. when specifying the exact hardware structure, and the `for` loops with behavioural descriptions (well, nearly never :) ) – Eugene Sh. Jul 14 '15 at 01:59
1

@Eugene Sh : the `for ... loop` version is also inherently parallel as it *must* execute in a single cycle. These descriptions are (neglecting the unseen code) exactly equivalent. Why doesn't synthesis see them that way? ... good question – Jul 14 '15 at 11:13

What is the practical difference between implementing FOR-LOOP and FOR-GENERATE? When is it better to use one over the other?

2 Answers2