I advise you to try not to think in terms of "temporary variables", "for loops" and "while loops". These are software constructions that can be useful, but ultimately you are designing a piece of hardware. You need to try to think about what physical pieces of hardware can be connected together to achieve your design, then how you might describe them using VHDL. This is difficult at first.
You should provide more information about what exactly you want to achieve (and on what kind of hardware) to increase the probability of getting a good answer.
You don't mention whether your multiplier needs to operate on signed or unsigned inputs. Let's assume signed, because that's a bit harder.
As has been noted, this whole exercise makes little sense if implemented combinationally, so let's assume you want a clocked (sequential) implementation.
You also don't mention how often you expect new inputs to arrive. This makes a big difference in the implementation. I don't think either one is necessarily more difficult to write than the other, but if you expect frequent inputs (e.g. every clock cycle), then you need a pipelined implementation (which uses more hardware). If you expect infrequent inputs (e.g. every 16 or more clock cycles) then a cheaper serial implementation should be used.
Let's assume you want a serial implementation, then I would start somewhere along these lines:
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity loopy_mult is
generic(
g_a_bits : positive := 4;
g_b_bits : positive := 4
);
port(
clk : in std_logic;
srst : in std_logic;
-- Input
in_valid : in std_logic;
in_a : in signed(g_a_bits-1 downto 0);
in_b : in signed(g_b_bits-1 downto 0);
-- Output
out_valid : out std_logic;
out_ab : out signed(g_a_bits+g_b_bits-1 downto 0)
);
end loopy_mult;
architecture rtl of loopy_mult is
signal a : signed(g_a_bits-1 downto 0);
signal b_sign : std_logic;
signal countdown : unsigned(g_b_bits-1 downto 0);
signal sum : signed(g_a_bits+g_b_bits-1 downto 0);
begin
mult_proc : process(clk)
begin
if rising_edge(clk) then
if srst = '1' then
out_valid <= '0';
countdown <= (others => '0');
else
if in_valid = '1' then -- (Initialize)
-- Record the value of A and sign of B for later
a <= in_a;
b_sign <= in_b(g_b_bits-1);
-- Initialize countdown
if in_b(g_b_bits-1) = '0' then
-- Input B is positive
countdown <= unsigned(in_b);
else
-- Input B is negative
countdown <= unsigned(-in_b);
end if;
-- Initialize sum
sum <= (others => '0');
-- Set the output valid flag if we're already finished (B=0)
if in_b = 0 then
out_valid <= '1';
else
out_valid <= '0';
end if;
elsif countdown > 0 then -- (Loop)
-- Let's assume the target is an FPGA with efficient add/sub
if b_sign = '0' then
sum <= sum + a;
else
sum <= sum - a;
end if;
-- Set the output valid flag when we get to the last loop
if countdown = 1 then
out_valid <= '1';
else
out_valid <= '0';
end if;
-- Decrement countdown
countdown <= countdown - 1;
else
-- (Idle)
out_valid <= '0';
end if;
end if;
end if;
end process mult_proc;
-- Output
out_ab <= sum;
end rtl;
This is not immensely efficient, but is intended to be relatively easy to read and understand. There are many, many improvements you could make depending on your requirements.