3

The following code allows me to determine distinct values in a pivot table in Excel:

=SUMPRODUCT(($A$A:$A2=A2)*($B$2:$B2=B2))

See also: Simple Pivot Table to Count Unique Values

The code runs perfectly fine. However, can somebody help me understand how this code actually works?

Community
  • 1
  • 1
user3116232
  • 433
  • 5
  • 17

2 Answers2

3

Firts, i think you made a type here, as the formula should be :

=SUMPRODUCT(($A$2:$A2=A2)*($B$2:$B2=B2))

Let's decompose it in 2 parts:

  • First, we check the cells between A2 and A2, so only one cell, and we check the number of cells wich are equals to A2. In this case, the output should be 1, as you're comparing A2 with A2. However, you're not limited to compare A2 with A2. If you had chosen 2 cells equals to A2, the results would have been 2.You can compare as many cells as you want with A2 (replace the characters after the $ to modulate). We do the same for the second bracket, except the pivot value is B2.
  • After that, you need to understand what the function SUMPRODUCT does. It sum the value of the product for a range of array. For example, say you have the value 1 on A1, 1 on A2, 2 on B1 and 3 on B2, if you make SUMPRODUCT((A1:A2)*(B1:B2)) , you will obtain (1*2) + (1*3) = 5. So, in the example you gave us, it will give the sum of (A2=A2)*(B2=B2) = 1.

So, it will output the number of pair (Ax,Bx) which is equals to (A2,B2). With the link, you can see that, if you select the first line only, the function will output 1 (and so the IF will output 1), but if you select the first 2 lines, the function will output 2, (and so the IF will output 0).

I hope this made sense to you, as i hoped i didn't make any mistakes along the explanation.

Peut22
  • 304
  • 1
  • 11
3

You write: the following code allows me to determine distinct values in a pivot table in Excel

No. That formula alone does not do that. Read on for the explanation of what does.

There's a typo in the formula. It should be

=SUMPRODUCT(($A$2:$A2=A2)*($B$2:$B2=B2))

See the difference?

The formula starts in row 2 and is copied down. In each row, the $A$2 reference and the $B$2 reference will stay the same. The $ signs make them absolute references. The relative references $A2 and A2 will change their row numbers when copied down, so in row 3 the A2 will change to A3 and B2 will change to B3. In the next row it will be A4 and B4, and so on.

You may want to create a sample scenario with data similar to that in the thread you link to. Then use the "Evaluate Formula" tool on the Formulas ribbon to see step by step what is calculated. The formula evaluates from the inside out. Let's assume the formula has been copied down to row 5 and we are now looking at

=SUMPRODUCT(($A$2:$A5=A5)*($B$2:$B5=B5))

($A$2:$A5=A5) this bit compares all the cells from A2 to A5 with the value in A5. The result is an array of four values, either true or false. The next bit ($B$2:$B5=B5) also returns an array of true or false values.

enter image description here

These two arrays are multiplied and the result is an array of 1 or 0 values. Each array has the same number of values.

The first value of the first array will be multiplied with the first value of the second array. (see the red arrows)

The second value of the first array will be multiplied with the second value of the second array. (see the blue arrows)

and so on.

True * True will return 1, everything else will return 0. The result of the multiplication is:

enter image description here

The nature of the SumProduct function is to sum the result of the multiplications (the product), so that is what it does.

This function alone does not do anything at all to establish distinct values in Excel. In the thread you link to, the Sumproduct is wrapped in an IF statement and THAT is where the distinct values are identified.

=IF(SUMPRODUCT(($A$2:$A2=A2)*($B$2:$B2=B2))>1,0,1)

In plain words: If the combination of the value in column A of the current row and column B of the current row has already appeared above, return a zero, otherwise, return a 1.

This marks distinct values of the combined columns A and B.

teylyn
  • 34,374
  • 4
  • 53
  • 73