Efficient use of ALMs (Adaptive Logic Modules)?

Question

I have a Verilog design that compiles to ~15K LEs on a Cyclone IV (EP4CE22F17C6N). When I compile the same same code on a Cyclone V (5CEFA2F23C8N), it takes ~8500 ALMs. Based on Altera's own LE equivalency for the particular Cyclone V, this would be ~20K LEs. Now, I realize that the estimates are going to be highly dependent on particular design, but a %33 increase in "effective" resource utilization seems like a lot.

So it makes me wonder if there are design tips/tricks/etc. for making more efficient use of ALMs. In particular, I'm looking for Verilog constructs that would improve the register density, fabric density, dense packing, etc.

Those comparison figures between different families are rough ballpark estimates. They tend to err on the optimistic side and you are in the ballpark already. Wider LUTs can't always be taken advantage of in designs that don't have sufficiently large clouds of combinational logic between flip-flops. If you don't meet that criteria you're going to consume more ALMs. — Kevin Thibedeau, Sep 08 '14 at 21:00
In general, don't worry about writing code that fits the hardware architecture, since the tools known the hardware details best and generally do a very good job. The designer should focus on writing code that is correct and maintainable. Saving engineering time for the benefit of time to marked, is usually more important than saving a some ALMs. An exception is, if you have a structure that is repeated a large number of times, then it might make sense optimizing that. — Morten Zilmer, Sep 09 '14 at 04:05
@Morten As it happens to be, the code does have a rather large module that is repeated a number of times. The particular code is the [Parallax Propeller source](http://www.parallax.com/microcontrollers/propeller-1-open-source). I'm not looking to re-write code specifically for ALMs, just get a better understanding of how to make good use of them. I'm interested in the low-hanging fruit, not the hardcore stuff. As you point out, I'd rather code for maintainability that performance. — seairth, Sep 09 '14 at 12:36
Are you using the same synthesis and P&R setting for both projects? Such project as the one you described here can be optimized in many ways and if the settings of the tool is not the same, you will be comparing apples with bananas. — FarhadA, Sep 09 '14 at 13:09
@FarhadA Yes, it is exactly the same toolchain (Quartus 14.0) in both cases. — seairth, Sep 11 '14 at 14:37
Well, it also depends on the synthesis optimization level, P&R strategy you use, if you chose to share resources or not and many other options. With FPGAs you must be careful when comparing numbers like this. It also can depend on the location of the IP, if the IP is in a place where routing resources are not free or limited, it will increase the number of LEs in your design where many of them are going to be used for routing rather than actual logic. — FarhadA, Sep 12 '14 at 07:42
In the end, I'm seeing that the LEs and ALMs are just different beasts. There is almost no point in comparing the two. I've chosen Chiggs' answer because it provides some actual things to try. But the only surefire way, it seems, to optimized for ALMs is to experiment with the code. As pointed out above, this is probably not worth the trade-off in increased complexity and/or reduced clarity. — seairth, Sep 15 '14 at 15:14

score 1 · Accepted Answer · edited May 23 '17 at 12:21

I would agree with the comments above that generally you shouldn't need to optimise, however it's always important to check that your code does map to the chosen architecture. Specifically:

Reset

Using the wrong kind of reset for your architecture can cause problems. It's also very easy to accidentally cause the synthesis tool to insert logic to emulate a clock-enable. For full details see this answer. For Altera you should be using an asynchronous reset which is synchronously de-asserted.

Priority of control signals

In Altera:

Asynchronous Clear, aclr—highest priority
Asynchronous Load, aload
Enable, ena
Synchronous Clear, sclr
Synchronous Load, sload
Data In, data—lowest priority

Latches

Easy to grep from the reports, but unless you're absolutely sure it's intentional, latches are generally bad mmmmkay.

Synthesis

There are many options available to tweaking the behaviour of the synthesis process. Here are a few that will affect your results:

ALM_REGISTER_PACKING_EFFORT

This option guides the Fitter when packing registers into ALMs.

MUX_RESTRUCTURE

Allows the Compiler to reduce the number of logic elements required to implement multiplexersin a design.

OPTIMIZATION_TECHNIQUE

Specifies the overall optimization goal for Analysis & Synthesis: attempt to maximize performance, minimize logic usage, or balance high performance with minimal logic usage.

Bear in mind that if your device isn't getting too full, the tool won't have much "incentive" to minimise logic utilisation unless you explicitly tell it to.

So far, changing some asynchronous enables to synchronous enables seemed to help somewhat. Of course, that changes the behavior a bit, so I am not yet sure if such a change is safe in this context. Otherwise, choosing "optimize space" for technique seemed to have a significant benefit on ALM usage. Again, this has probably also impacted timing, so it too may or may not be a good choice. — seairth, Sep 15 '14 at 15:06

Efficient use of ALMs (Adaptive Logic Modules)?

1 Answers1