1

I'm coming from software land, and trying to find out how to code sequential algorithm in VHDL. From the text book, it says that the statements inside a process are executed sequentially. But I realized it's only true when it comes to variable, rather than signals. Re signals inside a process,, they get updated at the end of process, and the evaluation is using right operand's previous value. So for my understanding, it's still concurrent. For performance purpose, I cannot always use variables for complex computation.

  1. But how to use signals to present sequential algorithm? My initial thoughts are using FSM. Is that true? Is FSM the only way to properly code sequential algorithm in VHDL?
  2. If I'm right that the signals statements within a process is kind of concurrent, then what's the difference between this and the signal concurrent assignment in the architecture level? Does the process's sequential nature only apply to variable assignment?
Cœur
  • 37,241
  • 25
  • 195
  • 267
user2035335
  • 11
  • 1
  • 2
  • It's not clear whether you are talking about a sequential algorithm, (executing in a single cycle) or sequential execution of an algorithm, taking several cycles. –  Feb 02 '13 at 21:38
  • Thanks Brian. To be clear, the sequential algorithm here I mean sequential execution of an algorithm taking several clock cycles. – user2035335 Feb 02 '13 at 22:07
  • Can you comment what the goal of your computation is? Are you trying to use VHDL as a programming language or are you trying to describe hardware? – BennyBarns Feb 04 '13 at 08:57
  • Thank you for your reply. There is no particular algorithm in mind, I just mean any sequential computation which needs several clock cycles to run in large. And I do want to describe the behavior, but maybe disappoint you that I don't want to describe the hardware. But can we make the conversation a bit more efficient by jumping onto the point directly (maybe I'm too impatient)? – user2035335 Feb 04 '13 at 13:55

2 Answers2

5

As you are trying to execute steps of an algorithm in different cycles, you have realised that the "sequential" constructs within a process do not, by themselves, do this - and in fact, variables do not help. A sequential program - unless it uses explicit "wait for some_event" e.g. wait for rising_edge(clk) - will be unrolled and execute in a single clock cycle.

As you have probably discovered using variables, this may be rather a long clock cycle.

There are three main ways of sequentialising execution in VHDL, with different purposes.

Let's try them to implement a linear interpolation between a and b,

a, b, c, x : unsigned(15 downto 0);
x <= ((a * (65536 - c)) + (b * c)) / 65536;

(1) is the classic state machine; the best form being the single process SM. Here the computation is broken down into several cycles which ensure that at most one multiply is in progress at a time (multipliers are expensive!) but C1 is computed in parallel (addition/subtraction is cheap!). It could safely be re-written with variables instead of signals for the intermediate results.

type state_type is (idle, step_1, step_2, done);
signal state     : state_type := idle;
signal start     : boolean := false;
signal c1        : unsigned(16 downto 0); -- range includes 65536!
signal p0, p1, s : unsigned(31 downto 0);

process(clk) is
begin
   if rising_edge(clk) then
      case state is
      when idle   => if start then
                        p1    <= b * c;
                        c1    <= 65536 - c;
                        state <= step_1;
                     end if;
      when step_1 => P0 <= a * c1;
                     state <= step_2;
      when step_2 => s <= p0 + p1;
                     state <= done;
      when done   => x <= s(31 downto 16);
                     if not start then  -- avoid retriggering
                        state <= idle;  
                     end if;
      end case;
   end if;
end process;

(2) is the "implicit state machine" linked by Martin Thompson (excellent article!) ... edited to add link as Martin's answer disappeared. Same remarks apply to it as for the explicit state machine.

process(clk) is
begin
   if start then
      p1 <= b * c;
      c1 <= 65536 - c;
      wait for rising_edge(clk);
      p0 <= a * c1;
      wait for rising_edge(clk);
      s  <= p0 + p1;
      wait for rising_edge(clk);
      x  <= s(31 downto 16);
      while start loop
         wait for rising_edge(clk);
      end loop;
   end if;
end process;

(3) is a pipelined processor. Here, execution takes several cycles, yet everything happens in parallel! The depth of the pipeline (in cycles) allows each logically sequential step to happen in sequential manner. This allows high performance as long chains of computations are broken into cycle-sized steps...

    signal start     : boolean := false;
    signal c1        : unsigned(16 downto 0); -- range includes 65536!
    signal pa, pb, pb2, s : unsigned(31 downto 0);
    signal a1        : unsigned(15 downto 0);

process(clk) is
begin
   if rising_edge(clk) then
      -- first cycle
      pb <= b * c;
      c1 <= 65536 - c;
      a1 <= a;     -- save copy of a for next cycle
      -- second cycle
      pa <= a1 * c1;  -- NB this is the LAST cycle copy of c1 not the new one!
      pb2 <= pb;   -- save copy of product b
      -- third cycle
      s  <= pa + pb2;
      -- fourth cycle
      x  <= s(31 downto 16);
   end if;
end process;

Here, resources are NOT shared; it will use 2 multipliers since there are 2 multiplies in each clock cycle. It will also use a lot more registers for the intermediate results and copies. However, given new values for a,b,c in every cycle it will spit out a new result every cycle - four cycles delayed from the inputs.

  • Hi @brian-drummond, thank you very much for going through my post and giving detailed explanation. From my understanding, re signal assignment in process, it's concurrent instead of sequential. Can I expand a bit to all the rest sequential sequential statements, e.g. if, case, when, etc.? Signals assignemts within them are also concurrent like within process, aren't they? Cheers. – user2035335 Feb 06 '13 at 19:28
  • It's a little different than that - and I disagree to an extent with Zennehoy that the software approach is completely misleading : the flow of control (especially with variables) and result of an algorithm is identical to the software view : what is different is the time (one clock cycle!) - and some constraints to make that possible. Signal assignments are different : they are your inter-process communications - and their semantics are truly elegant : answered here... http://stackoverflow.com/questions/13954193/is-process-in-vhdl-reentrant/13956532#13956532 –  Feb 06 '13 at 19:44
  • When you understand both signal assignment and variable assignment you will see that you can also write the pipeline model using variables for the internal values BUT you have to describe the pipeline backwards! (fourth cycle first). That way you use the previous cycle's value of s, THEN generate the new value. It works, but I prefer describing my pipelines forwards... –  Feb 06 '13 at 19:56
1
  1. Most multi-cycle algorithms can be implemented either by using an FSM as you suggest, or by using pipelined logic. Pipelined logic is probably the better choice if the algorithm consists of strictly sequential steps (i.e., no loops), an FSM would typically only be used for more complex algorithms that require different control flows depending on the input.

    Pipelined logic is effectively a very long chain of combinatorial logic split into multiple "stages" using registers, with data flowing from one stage to the next. The registers are added to reduce the delay of each stage (between two registers), allowing higher clock frequencies at the cost of increased latency. Note however that higher latency does not mean lower throughput, since new data can begin processing before the previous data item has completed! This is generally not possible with an FSM.

  2. The biggest difference between signal assignment within a process as opposed to the architecture is that you may assign a value to a signal in multiple places within the process, with the last assignment "winning". At the architecture level, only a single assignment statement to a signal is possible. Many control flow statements (if, case/when, etc.) are also only available within a process, not at the architecture level.

Community
  • 1
  • 1
zennehoy
  • 6,405
  • 28
  • 55
  • Thank you @zennehoy for your help. Now I know FSM and pipelined logic can be used to implement sequential algorithm. But it's very ambiguous that all the text book say sequential statements, i.e. process, if, case, when, etc., can be used as software programming C to present sequential algorithms, but based on your explanation and my understanding, they're actually concurrent when it comes to signals. It's quite confusing for beginners like me. I might be missing something, please let me know if I'm wrong. – user2035335 Feb 06 '13 at 10:55
  • @user2035335 Any book on HDL that in any way suggests to use software programming approaches while coding HDL should be trashed, imho! Everything(!) in hardware happens concurrently, just with different data. The only notion of "sequential" is the sequence in which data passes from one hardware component (be it a register, gate, transistor, whatever) to the next. So yes, I agree that everything happening within a process can and should be considered as happening concurrently, you just have to figure out what part of the process "wins" (=output of the final multiplexer). – zennehoy Feb 06 '13 at 15:59