As you are trying to execute steps of an algorithm in different cycles, you have realised that the "sequential" constructs within a process do not, by themselves, do this - and in fact, variables do not help. A sequential program - unless it uses explicit "wait for some_event" e.g. wait for rising_edge(clk) - will be unrolled and execute in a single clock cycle.
As you have probably discovered using variables, this may be rather a long clock cycle.
There are three main ways of sequentialising execution in VHDL, with different purposes.
Let's try them to implement a linear interpolation between a and b,
a, b, c, x : unsigned(15 downto 0);
x <= ((a * (65536 - c)) + (b * c)) / 65536;
(1) is the classic state machine; the best form being the single process SM.
Here the computation is broken down into several cycles which ensure that at most one multiply is in progress at a time (multipliers are expensive!) but C1 is computed in parallel (addition/subtraction is cheap!). It could safely be re-written with variables instead of signals for the intermediate results.
type state_type is (idle, step_1, step_2, done);
signal state : state_type := idle;
signal start : boolean := false;
signal c1 : unsigned(16 downto 0); -- range includes 65536!
signal p0, p1, s : unsigned(31 downto 0);
process(clk) is
begin
if rising_edge(clk) then
case state is
when idle => if start then
p1 <= b * c;
c1 <= 65536 - c;
state <= step_1;
end if;
when step_1 => P0 <= a * c1;
state <= step_2;
when step_2 => s <= p0 + p1;
state <= done;
when done => x <= s(31 downto 16);
if not start then -- avoid retriggering
state <= idle;
end if;
end case;
end if;
end process;
(2) is the "implicit state machine" linked by Martin Thompson (excellent article!) ... edited to add link as Martin's answer disappeared.
Same remarks apply to it as for the explicit state machine.
process(clk) is
begin
if start then
p1 <= b * c;
c1 <= 65536 - c;
wait for rising_edge(clk);
p0 <= a * c1;
wait for rising_edge(clk);
s <= p0 + p1;
wait for rising_edge(clk);
x <= s(31 downto 16);
while start loop
wait for rising_edge(clk);
end loop;
end if;
end process;
(3) is a pipelined processor. Here, execution takes several cycles, yet everything happens in parallel! The depth of the pipeline (in cycles) allows each logically sequential step to happen in sequential manner. This allows high performance as long chains of computations are broken into cycle-sized steps...
signal start : boolean := false;
signal c1 : unsigned(16 downto 0); -- range includes 65536!
signal pa, pb, pb2, s : unsigned(31 downto 0);
signal a1 : unsigned(15 downto 0);
process(clk) is
begin
if rising_edge(clk) then
-- first cycle
pb <= b * c;
c1 <= 65536 - c;
a1 <= a; -- save copy of a for next cycle
-- second cycle
pa <= a1 * c1; -- NB this is the LAST cycle copy of c1 not the new one!
pb2 <= pb; -- save copy of product b
-- third cycle
s <= pa + pb2;
-- fourth cycle
x <= s(31 downto 16);
end if;
end process;
Here, resources are NOT shared; it will use 2 multipliers since there are
2 multiplies in each clock cycle. It will also use a lot more registers for
the intermediate results and copies. However, given new values for a,b,c in every cycle it will spit out a new result every cycle - four cycles delayed from the inputs.