-1

I have designed a 16*16 Montgomery multiplier. The code uses a 16*16 multiplier to perform three multiplications. The multiplications are performed one after the other using the same multiplier and the result of each multiplication is stored in the registers. The single 16*16 multiplier performs at a frequency of about 1550 MHz, but the frequency of the Montgomery multiplier (which uses a single 16*16 multiplier three times) is reduced to almost 500 MHz when the three multiplications are carried out in series. I want to avoid the decrease in frequency and want to operate it at the frequency of single multiplier. Need help in this.

The code is provided along with.(only multiplications are provided in this case. Additions, shifting has been excluded for simplicity)

`define m 11
`define mbar 245
module test_mul(a,b,clk,reg2,reset);

input [15:0] a,b;
input clk,reset;
output reg [31:0] reg2;


reg [15:0] x,y;

reg [31:0] reg0,reg1;
reg [5:0] count;

wire [31:0]p;


test_mul16 a1 (x,y,clk, p);


always @ (posedge clk)
begin
if (reset)
begin  x <= a; y <= b; count= 6'd0 end

else begin

if (count == 11)
reg2 <= p;
if (count == 12)
begin x <= reg0[15:0]; y <=`mbar; end
if (count == 27)
reg1 <= p;
else if (count == 28)
begin
x <= reg1[15:0];
y <= `m;
end
else if (count == 39)
begin
reg2 <= p;
end

count = count+1;

end
end

endmodule

module test_mul16(a,b,clk,reg2);

input [15:0] a,b;
input clk;
output reg [31:0] reg2;

reg [31:0] reg0, reg1;
always @ (posedge clk)
begin
reg0<= a*b;
reg1<=reg0;
reg2<=reg1;
end
endmodule
  • 1
    Your FPGA-tool may have decided that a different type of multiplier is more optimal for the configuration. You'll need to constrain the tool's freedom using constraints. But frankly I have a hard time understanding what you are trying to do. What are these seemingly random counter values? Why is test_mul16 passing the multiplication through registers in order to delay the output? And reg0 does not get set to a value before it is used. – Hida Feb 16 '17 at 15:41
  • A couple questions to confirm the issue. Is this a timing question? Are you saying that the design timed to a clock frequency of 1.55Ghz when performing a single multiplier, but only times to 500Mhz when a serial multipliers are performed? There is a state variable (count), that seems to be making a lot of do nothing cycles, so I am fairly certain this is a backend timing question, but want to confirm. – Rich Maes Feb 16 '17 at 23:06
  • @Hida Since the multiplier requires certain cycles to complete its operation, the counter counts the clock cycles and provides the delay the counter needs when used in series. The registers are used to store the result of multiplications. reg0 for first multiplication result, reg1 for second multiplication result, reg2 for third multiplication result, – Safi Jadoon Feb 17 '17 at 10:12
  • @RichMaes Yes it is a timing issue, Counter just counts the number of clock cycles which the single multiplier requires to complete the operation. Actually the first multiplication is started, then we wait for few clock cycles to compute the result, the result is then stored in the register and multiplier is provided with new operands, again the multiplier needs some clock cycles, this process continues three times. I just want this whole code to work at the frequency of single multiplier that is almost 1500 MHz. – Safi Jadoon Feb 17 '17 at 10:26
  • @SafiJadoon The registers inside `test_mul16` does not contain results from different multiplications, the values in `regN` are updated on every clock cycle and will merely delay the output from the actual multiplication. More to the point you should probably use your FPGA tool to implement a multiplier instead of using `a*b`. This will give you better control over how it works. Finally I suggest that you choose a multiplier which has some kind of `result_valid` signal that can be used instead of a counter. See if this helps: https://www.altera.com/en_US/pdfs/literature/an/an306.pdf – Hida Feb 20 '17 at 14:08

1 Answers1

0

Ok, so based on the comment where Hida says that this is a timing issue, I think there could be a couple things going on here. I can help you improve timing, but I am not certain that we can get to 1.5Ghz. You should let us know which vendor you are using too.

You have a if with a reset, but you do not reset all variables. That is okay as long as you know that you don't have anything uninitialized. But the real thing here is that many new FPGA technologies, don't want you to use reset if you don't have to. I notice that you are reseting, x and y with inputs a and b. Do you have to do this? If you do not have to reset, x and y to a and b respectively, you can remove them from the reset and this will help timing improve.

Your state machine, (using the variable state) is not one hot. You may look at coding that to use one hot and that will give you a little boost.

To do this, make count a 40 bit registers, reset it to 40'h00001, and then on clock assign it as such count <= {count[38:0],count[39]}; then use the individual bit to trigger your logic.

Next, take a look at your if's You have a bunch of one-off if's. In some cases, you have multiple if's assigning the same variable. This probably okay, but the synthesizer is probably having to work some things out, and it might not be as efficient as it could be if you coded it differently. Try using a case statement. If you follow the one-hot suggestion above your case statements will be like this case(count) 40'd11 : begin do some stuff end 40'd12 : begin do some other stuff end etc... endcase

Finally, also in your IF's, you have some if and if else going on. Get those massaged into this case statement above, because you are basically assignably priority to counts 27, 28 and 39. For one variable, there can and should be no priority between the values. The value is either 27, 28 or 39, or something else, and the logic will never have a case to choose one state over another.

If you make some of those changes, your speed should go up. Would really like to know which vendor is saying you hit 1.5Ghz though.

Rich Maes
  • 1,204
  • 1
  • 12
  • 29