2

I have a w*y-bit width std_logic_vector named matrix where w and y are integers. I want to have y-bit width std_logic_vector called output that its bits are concurrently assigned to AND of w bits of matrix elements.

For example, w=5 y=3:

output(2) <= matrix(14) and matrix(13) and matrix(12) and matrix(11) and matrix(10);
output(1) <= matrix(9) and matrix(8) and matrix(7) and matrix(6) and matrix(5);
output(0) <= matrix(4) and matrix(3) and matrix(2) and matrix(1) and matrix(0);

In the example, you can see that output is y-bit long which is 3, and each bit of the output is assigned to AND of w-bits of matrix which is 5.

Now, I want to write it with generics. I have tried to write it in two for..generate loop but I cannot handle it. What should be in the right hand side of the output(i)? It can also be implemented in another way, and I am very welcome to another ideas. It does not have to be in the way I thought.

library ieee;
use ieee.std_logic_1164.all;

entity module is

    generic (
        w : integer := 5;   -- input width
        y : integer := 3    -- output width
    );

    port (
        matrix     : in  std_logic_vector(w * y - 1 downto 0);  -- matrix
        output     : out std_logic_vector(y - 1 downto 0)       -- output
    );
end entity module;

architecture rtl of module is

begin  -- architecture rtl


    AND_FOR: for i in y - 1 downto 0 generate
        AND_FOR2: for j in w - 1 downto 0 generate
            output(i) <= ????;
        end generate AND_FOR2;
    end generate AND_FOR;

end architecture rtl;
efe373
  • 151
  • 2
  • 11
  • If you have access to VHDL 2008, I believe it has a reduction and operator; if not, it's not that hard to code one up in a function yourself. With that, you'd only need a single-level for generate. Alternatively, you can replace the second for generate with a process and for loop and store the product in a temporary variable. After the for loop, save the variable to the output bit that you want. See https://stackoverflow.com/questions/20296276/and-all-elements-of-an-n-bit-array-in-vhdl. – Travis Apr 01 '22 at 20:59

2 Answers2

2

The goal here is finding a way to describe slices with the range of elements in your assignments as inputs to a reduction AND using elaborated bounds in as few generate statements as possible.

Generate statements are attractive in providing static indexed names or slice names preserving synthesis eligibility. We don't have a way to elaborate different complexity (here length) expressions in VHDL which makes an AND reduction attractive. Depending on synthesis tools you may be 'encouraged' to use a preexisting function, such as AND_REDUCE found in Synopsys package std_logic_misc.

In -2008 we can use a unitary AND (the reserved word and followed by an operand. In tools compliant to earlier revisions of the VHDL standard we can use a function call:

architecture rtl of module is
    -- -2008 use unitary AND without parameter instead of call:
    function reduce_and (inp: std_logic_vector) return std_logic is
        variable retval:    std_logic := '1';
    begin
        for i in inp'range loop
            retval := retval and inp(i);
        end loop;
        return retval;
    end function;
begin
AND_FOR: 
    for i in y - 1 downto 0 generate
        output(i) <= reduce_and(matrix((i + 1) * w - 1 downto i * w));
    end generate;

describe_outputs:
    process
    begin
        report "matrix'range is (" & integer'image(matrix'left) & 
               " downto " & integer'image(matrix'right) & ")";
        for i in y - 1 downto 0 loop
            report "output(" & integer'image(i) & ") <= reduce_and(matrix(" &
                integer'image((i + 1) * w - 1 ) & " downto " &
                integer'image (i * w) & ")";
        end loop;
        wait;
    end process;
end architecture;

A for generate statement will elaborate a block for each value of it's for generate parameter (here i). It's declared as a constant in the block declarative region. Both w and y are globally static generic constants and as a result the range of matrix is also globally static. This mostly means that using a generate statement like this results in synthesis eligible assignment statements.

The process statement labeled describe_outputs is added to demonstrate equivalency (the ability to algorithmically describe the elements being ANDed together) with your previous assignment statements because you haven't demonstrated a value for matrix and an expected result along with a testbench. The process can be eliminated.

As long as you use values for w and x that don't result in a null range or null slice this method should work with the hard coded direction (downto).

Report statements are implementation dependent (with required 'header' information). Here shown for ghdl:

%: ghdl -a module.vhdl
%: ghdl -e module
%: ghdl -r module
module.vhdl:35:9:@0ms:(report note): matrix'range is (14 downto 0)
module.vhdl:38:13:@0ms:(report note): output(2) <= reduce_and(matrix(14 downto 10)
module.vhdl:38:13:@0ms:(report note): output(1) <= reduce_and(matrix(9 downto 5)
module.vhdl:38:13:@0ms:(report note): output(0) <= reduce_and(matrix(4 downto 0)
%:
user16145658
  • 695
  • 1
  • 5
  • 10
  • I actually needed ```reduce_xor```, so I changed ```variable retval: std_logic := '1';``` to ```variable retval: std_logic := '0';``` and also ```and``` to ```xor``` obviously, and use this solution. It works quite good! Thank you. – efe373 Apr 05 '22 at 17:51
1

You can indeed model this with a 2-dimensions generate but you need an intermediate signal corresponding to the outputs of 2-inputs AND gates.

Let's consider your input as a y rows times w columns matrix. For easier explanation we will denote m(i,j) the element in row i, column j of matrix m. So your matrix(i*w+j) is denoted matrix(i,j).

You want output(i), where 0 <= i <= y-1, to be the bit-wise and of row i of matrix. You can use a y rows times w-1 columns tmp matrix to receive the results of 2-inputs AND. tmp(i,0) will be the AND of matrix(i,0) and matrix(i,1). Then, tmp(i,1) will be the AND of tmp(i,0) and matrix(i,2)... Finally, tmp(i,w-2) will be the AND of tmp(i,w-3) and matrix(i,w-1), and it will also be the output(i) you want.

VHDL coding:

library ieee;
use ieee.std_logic_1164.all;

entity module is
  generic(
    w: integer := 5;
    y: integer := 3
  );
  port(
    matrix: in  std_logic_vector(w * y - 1 downto 0);
    output: out std_logic_vector(y - 1 downto 0)
  );
end entity module;

architecture rtl of module is
  type matrix_t is array(natural range <>, natural range <>) of std_logic;
  signal tmp: matrix_t(0 to y - 1, 0 to w - 2);
begin
  and_for: for i in 0 to y - 1 generate
    tmp(i, 0) <= matrix(i * w) and matrix(i * w + 1);
    and_for2: for j in 1 to w - 2 generate
      tmp(i, j) <= tmp(i, j - 1) and matrix(i * w + j + 1);
    end generate and_for2;
    output(i) <= tmp(i, w - 2);
  end generate and_for;
end architecture rtl;

Demo:

library ieee;
use ieee.std_logic_1164.all;

entity module_sim is
end entity module_sim;

architecture sim of module_sim is
  constant w: natural := 5;
  constant y: natural := 3;
  signal matrix: std_logic_vector(w * y - 1 downto 0);
  signal output: std_logic_vector(y - 1 downto 0);
begin
  u0: entity work.module(rtl)
  port map(
    matrix => matrix,
    output => output
  );

  process
  begin
    matrix <= (others => '1');
    wait for 1 ns;
    report to_string(output);
    for i in 0 to y - 1 loop
      matrix(i * w) <= '0';
    end loop;
    wait for 1 ns;
    report to_string(output);
    matrix <= (others => '1');
    wait for 1 ns;
    report to_string(output);
    wait;
  end process;
end architecture sim;
$ ghdl -a --std=08 module_sim.vhd
$ ghdl -r --std=08 module_sim
module_sim.vhd:50:5:@1ns:(report note): 111
module_sim.vhd:55:5:@2ns:(report note): 000
module_sim.vhd:58:5:@3ns:(report note): 111

But, as suggested in another answer, using a AND reduction function instead of the innermost generate statement is simpler and less error prone.

Renaud Pacalet
  • 25,260
  • 3
  • 34
  • 51
  • In days past we'd worry about how many processes (here w * y) and how many times they'd get called before simulation time advances. Now not so much, the limiting factor on simulation performance is the amount of memory in a model - particularly for signals. There's also a single process model (a function call expression, only needing to register the output for big enough values of w to have gate depth be a concern as well as depending on FPGA logic cell architecture for synthesis). A properly constrained synthesis tool can give the same answer from any of these descriptions. – user16145658 Apr 04 '22 at 08:45