2

I'm trying to set up a simple bruteforce convolution processor with my DE0 Nano Altera FPGA board. Here's what my code looks like :

LIBRARY ieee;
USE ieee.std_logic_1164.all;
use ieee.numeric_bit.all;

ENTITY Convolution IS
    PORT(   clock : IN std_logic;
            audio_in : IN unsigned(15 downto 0);
            audio_out : OUT unsigned(31 downto 0) );    
END Convolution;

ARCHITECTURE Convolution_Core OF Convolution IS

    constant impulse_length : integer := 10;

    type array16 is array(0 to impulse_length-1) of unsigned(15 downto 0);
    type array32 is array(0 to impulse_length-1) of unsigned(31 downto 0);

    constant impulse : array16 :=   (x"FFFF", x"FFFE", x"FFFD", x"FFFC", 
                                                 x"FFFB", x"FFFA", x"FFF9", x"FFF8",
                                                 x"FFF7", x"FFF6");

    signal audio_buffer : array16 := (others=> (others=>'0'));

    signal seq_buffer : unsigned(31 downto 0);

BEGIN
    process(clock)
    begin
        if rising_edge(clock) then
            -- buffer the audio input in audio_buffer
            for i in 0 to (impulse_length-2) loop
                audio_buffer(i) <= audio_buffer(i+1);
            end loop;
            audio_buffer(impulse_length-1) <= audio_in;

            for i in 0 to (impulse_length-1) loop
                if i = 0 then
                    seq_buffer <= audio_buffer(i) * impulse(impulse_length-1-i);
                else
                    seq_buffer <= seq_buffer + audio_buffer(i) * impulse(impulse_length-1-i);
                end if;
            end loop;
        end if;
    end process;

    audio_out <= seq_buffer;

END Convolution_Core;

My problem is : the index of impulse(impulse_length-1-i) doesn't decrease during the successive for loops, but the index of audio_buffer(i) does. That's what I fond out simulating the code and figuring out why my results are wrong.

I tried to put (impulse_length-1-i) into a signal to be able to watch it in ModelSim, and it starts at max/min 32 bits signed range (+/- 2 147 483 647) and the next cycle jumps to zero, and stays at zero.

I also tried to use a variable j inside the process, to be able to initiate it at zero at the beginning of the process and use it as an index for my arrays instead of i and increment it after the actual calculation, but that made ModelSim to report a fatal error, can't figure out why neither.

Could someone explain me what I did wrong ?

Thanx in advance.

nick_g
  • 489
  • 1
  • 7
  • 15
ricothebrol
  • 81
  • 12
  • See also http://stackoverflow.com/questions/13954193/is-process-in-vhdl-reentrant/13956532#13956532 –  Oct 03 '16 at 11:18

2 Answers2

2

Main problem is that you don't understand how signals and for loops work when you're describing hardware instead of writing software.

Each iteration of the second for loop is assigning a value to the same signal. Within a process, only the last signal assignment to the named signal matters, and all reads of a signal use the value it held before the start of the process. This means that only the (impulse_length-1) iteration of your second for loop does anything.

I wrote an answer a few years ago about how signals and variables work within a VHDL process which can give you a bit more detail on this: https://stackoverflow.com/a/19234859/1360002

If you write it such that all 10 add/multiply operations happen on the same cycle (such as using a variable in place of seq_buffer to calculate the value you actually assign to the seq_buffer signal), you're describing hardware that will have a very long logic path and won't work if your clock rate is even moderately high. This may not be an issue for your case.

Furthermore, you might have problems with the result width coming out of the multiply operator, but I can't be sure as I don't imply multiply units so I'm unfamiliar with the details of the associated operator function.

Community
  • 1
  • 1
QuantumRipple
  • 1,161
  • 13
  • 20
0

Thanx a lot for answering me !

So if I understand correctly, using a for loop with variables inside a process leads to the same kind of logic implementation as using some "generate" statements. Maybe that's why when I tried something similar for my convolution algorithm it took ages to compile ;)

So I guess the only way to do something similar to a for loop in Java or C is to "manually" clock each iterations, switch the input signals and buffer the successsive results, right ?

But then I must say I don't understand why the "audio buffering" process I wrote is working correctly, even with a timing simulation...

Best regards

ricothebrol
  • 81
  • 12
  • The "audio buffering" for loop works because when the loop is unrolled, each iteration is assigning a value to a different index in your signal. This means they can all operate in parallel without conflict. For loops do not really operate sequentially on signals. Generate statements work just like for loops, but instead of having only the final assignment matter, you'll encounter multiple driver errors during implementation. – QuantumRipple Oct 03 '16 at 19:12
  • You are correct in that the only way to do truly sequential logic is to have some sort of state machine that muxes inputs to your math block, but if you want to have a fresh output every cycle that is the combination of all 10 values, you have to use a variable to create a long timing path with the 10 multiply-adds or pipeline your math (which won't save resources, but will provide shorter logic paths). – QuantumRipple Oct 03 '16 at 19:15
  • Compilation time isn't generally meaningful for how timing efficient or resource efficient a design is. It is correlated with how much resources are used overall vs how many exist in your target FPGA, but not how many are used as compared to how many are actually necessary (efficiency) for the function you're trying to implement. – QuantumRipple Oct 03 '16 at 19:16
  • Thanx a lot for sharing your knowledge ! Another ting that I find surprising is that when I only compile the "audio buffer" part of my code, Quartus II says it uses 0 logic elements. I even tried to do it with a buffer size of thousands still 0 logic elements... Is it realistic ? Then, is there a point to implement a circular buffer with a FPGA, if it can do thousands shifts in parallel ? I hope my questions aren't too silly... Thanx in advance – ricothebrol Oct 03 '16 at 20:53
  • Without the math portion, no part of your audio buffer contributes to any outputs. Therefore the synthesizer will optimize it away, regardless of the size. If you connect `audio_buffer(0)` to the lower bits of your `audio_out`, you'll see the actual size of your shift register. Note that in hardware using actual shift registers will be more efficient than circular buffers until your shift register is deep enough to be implemented using the hardened block rams. – QuantumRipple Oct 03 '16 at 21:06
  • That's a good thing actually, addressing a circular buffer seems a bit tricky ;) Thanx a lot – ricothebrol Oct 04 '16 at 04:43