Could anyone tell the difference in the partial products reduction method or mechanism between Wallace and Dadda multipliers ?
I have been reading A_comparison_of_Dadda_and_Wallace_multiplier_delays.pdf
Could anyone tell the difference in the partial products reduction method or mechanism between Wallace and Dadda multipliers ?
I have been reading A_comparison_of_Dadda_and_Wallace_multiplier_delays.pdf
Both are very similar. Instead of the traditional row based algorithm, they all aim to implement a multiplication A*B by 1/ anding A with bits b_i, 2/ counting bits for every column until there are only two rows and 3/ performing the final addition with a fast adder.
I worked on a Dadda multiplier, but this was many many years ago, and I am not sure to remember all the details. To my best knowledge, the main difference are in the counting process.
Wallace introduced the "Wallace tree" structure (that is still useful in some design). This allows, given n bits, to count the number of bits at 1 in this set. A (n,m) wallace tree (where m=ceil(log_2 n)) gives the number of bits at 1 among the n inputs and outputs the result on m bits. This is somehow a combinatorial counter. For instance, below is a the schematic of a (7,3) Wallace tree made with full adders (that are (3,2) Wallace trees).
As you can see, this tree generates results of logical weight 2^0, 2^1 and 2^2, if input bits are of weight 2^0.
This allows a fast reduction in the height of the columns, but can be somehow inefficient in terms of gate count.
Luigi Dadda do not use such an aggressive reduction strategy and tries to keep the columns heights more balanced. Only full (or half adders) are used and every counting/reduction will only generate bits of weight 2^0 and 2^1. The reduction process is less efficient (as can be seen by the larger number of rows in your figure), but the gate count is better. Dadda strategy was also supposed to be slightly less time efficient, but according to the enclosed paper, that I did not knew, it is not true.
The main interest in Wallace/Dadda multipliers is that they can achieve a multiplication with ~log n time complexity, which is much better than the traditional O(n) array multiplier with carry save adders. But, despite this theoretical advantage, they are not really used any longer. Present architectures are more concerned with throughput than latency and prefer to use simpler array structures than can be efficiently pipelined. Implementing Wallace/Dadda structure is a real nightmare beyond a few bits and adding pipeline to them is very complex due to their irregular structure.
Note that other multiplier designs yield to log n time complexity, with a more regular and implementable divide-and-conquer strategy, as for instance the Luk-Vuillemin multiplier.