4

I have been unable to find an answer, possibly due to me being unable to put specific enough nomenclature on the involved processes.

I use Vitis HLS to synthesize designs where one call of the main function is one clock cycle long, being pipelined of course. This works fine for almost all of our cases. Where this is not possible (i.e. for components where we need to guarantee certain latencies / pipelining depths) I use verilog.

The goal is to transfer data via DMA to a Zynq-7000's memory and THEN issue an interrupt to let the PS know that the DMA transfer is finished.

Suppose I have a Vitis HLS project, where the PS can initiate a DMA transfer of uint32s using (a rising edge on a signal in) an s_axilite interface to my component, like in the code below:

#include <cstdint>
 
void Example
(
    uint32_t *dmaRegion,
    bool &intrSig,
    volatile bool writeNow
)
{
    #pragma HLS PIPELINE II=1
    #pragma HLS INLINE RECURSIVE
    #pragma HLS INTERFACE s_axilite port=return bundle=registers
    #pragma HLS INTERFACE ap_ctrl_none port=return
    
    #pragma HLS INTERFACE m_axi port=dmaRegion offset=slave bundle=x
    #pragma HLS INTERFACE s_axilite port=dmaRegion bundle=registers
    #pragma HLS INTERFACE ap_none port=dmaRegion
    
    #pragma HLS INTERFACE s_axilite port=writeNow bundle=registers
    #pragma HLS INTERFACE ap_none port=writeNow
    
    #pragma HLS INTERFACE ap_none port=intrSig
    
    static bool lastWriteNow { false };
    static uint32_t Ctr { 0 };
    
    bool intr = false;
    
    if (!lastWriteNow && writeNow)
    {
        Ctr++;
        dmaRegion[10] = Ctr;
        intr = true;
    }
    intrSig = intr;
    
    lastWriteNow = writeNow;
}

Now, this seems to work fine and cause a 1-clock-cycle-pulse interrupt as long as WREADY is driven high by the Zynq (and through a SmartConnect to my component) and I have found some examples where this is done this way. Also, the PS grabs the correct data from the DDR memory (L2 Data Cache has been disabled for this memory region) directly after the interrupt.

However, what will happen if for example more AXI masters are trying to drive the Smart Connect and cause congestion, effectively causing WREADY to go low for this component? In my tests, where I drove the WREADY signal of the AXI Smart Connect Master Interface to a constant zero to simulate (permanent) congestion, the interrupt signal (and WVALID) was driven to a permanent high, which would mean.... what? That the HLS design blocked inside the if clause? I do not quite get it as it seems to me that this would contradict the II=1 constraint (which is reported by Vitis HLS as being satisfied).

In a way it makes sense of course, since WVALID must go high when data is available and it must stay high until WREADY is high as well. But why the interrupt line goes (and stays) high no matter what even though the transaction is not yet finished evades me.

Is this at all possible with any guarantees about the m_axi interface, or will I have to find other solutions?

Any hint and information (especially background information about that behaviour) is very much appreciated.

Edit:

For example, this works fine: Example 1

but this causes the interrupt to stay high forever: Example 2

Of course, the transaction cannot finish. But it seems I have no way of unblocking the design so long as the AXI bus is congested.

PhilMasteG
  • 3,095
  • 1
  • 20
  • 27
  • I want to help you but I have some questions first. You say "The goal is to transfer data via DMA to a Zynq-7000's memory and THEN issue an interrupt to let the PS know that the DMA transfer is finished." This is done already by the Xilinx DMA IP. Are you using this feature, and if not, what is preventing you from using it? – Fra93 Jun 21 '22 at 15:19
  • Also, adding a sketch of your block design (or top level with IPs and their connections, if everything is done in RTL) would help a lot in understanding the issue. – Fra93 Jun 21 '22 at 15:23
  • This core when "synthesized" via HLS instantiates an AXI master interface which I connect to the first GP AXI slave of the Zynq. This works (data is transferred to the DDR memory addresses) - is that the wrong way of doing this? I was unable to find another one. I will look into providing you a minimal example project, before the bounty expires, of course. – PhilMasteG Jun 21 '22 at 17:37
  • This is fine, I thought that you also had a Xilinx DMA IP, which triggers the interrupt already for you. Also, good catch of disabling the cache. – Fra93 Jun 21 '22 at 19:37

1 Answers1

1

Vitis Scheduler view

When I compile your code and look at the schedule view this is the result:

enter image description here

What I understand is that there is phi node (term borrowed from LLVM) which means that the value of intrSig can't be set before finishing the AXI4 write response. Since this is then converted into RTL the signal must have a value, and if it goes high, then there is congestion on the AXI4, it will stay high until the AXI transaction has finished.

HLS craziness

I tried to look into the HDL, with not much luck. I only got an intuition though which I try to share:

The red wires are the ones that eventually drive the intrSig signal. The flip flop is driven to 1 through the SET port, and to 0 by the RST port.

enter image description here

Long way to intrSig from this FF, but it eventually gets there:

enter image description here

The SET signal is driven by combinatorial logic using writeNow:

enter image description here

And lastly the wready goes a long way but it interferes to the pipeline chain of registers that eventually drives the intrSig.

enter image description here enter image description here

Is this proof of what is happening? Unfortunately no, but there are some hints that the outcome of the m_axi transaction stops the interrupt pipeline to advance.

Some debugging hints

I don't know if clearing the wready signal actually simulates congestion, the axi protocol starts with a awready and I expect a congested interconnect to not accept transactions from the beginning.

Also, I would instantiate your IP alone, then attach some AXI VIP (axi verification IPs) which are provided in Vivado by Xilinx and programmed in SystemVerilog to give you the output you want, while recording all your data. You will also be able to look all the waveforms and detect where your issues are.

You can have your IP write into one of these AXI4VIP configured in slave mode, or you can write to a BRAM.

I'll leave here some documentation.

Fra93
  • 1,992
  • 1
  • 9
  • 18
  • Thank you for this detailed analysis. I will look into that more or I guess find a workaround. I was however unable, to find anything in the AXI 4 specification that dictates the `AWDATA` handshake to have to occur before the `WDATA` handshake. But this might still be the problem. I will try stalling `AWDATA` instead. – PhilMasteG Jun 22 '22 at 10:15
  • AFAIK there is no `AWDATA` but only `AWADDR` during the first axi handshake. Here: http://www.gstitt.ece.ufl.edu/courses/fall15/eel4720_5721/labs/refs/AXI4_specification.pdf#G7.4948405, page 41, you can find the transaction dependencies between signals. – Fra93 Jun 22 '22 at 12:32
  • 1
    Yes, I misspoke. I meant AWADDR. However, also in that document I can find no dependency that says that AWVALID/AWREADY has to occur before WVALID/WREADY. It also wouldn't make sense to me, because why not have both transmitted at the same time? Anyways, you've helped a lot and if there are no further answers you'll be awarded the bounty, so thanks. – PhilMasteG Jun 23 '22 at 06:58