7

I have just learnt how to Pivot in SQL Server. I was wondering why the max function is used when we want to pivot text columns? What's the logic behind this? I understand if it's Count, Sum etc (because your summing that respective row and column) but I don't understand the logic of using max when we have text columns?

For example, my code is:

SELECT * 
  FROM ( SELECT DATE
               ,SITA
               ,EVENT 
          FROM  [UKRMC].[dbo].[strategy] 
          where datename(year, DATE) = 2018 or datename(year,DATE)=2019
        ) strategy
  PIVOT ( max(EVENT)
          FOR SITA IN ([ABZPD],[BFSPD]
,[BFSZH]
,[BHXPD]
,[BHXZH]
,[BRSZH]
,[BRUPQ] ) piv
Andrea
  • 11,801
  • 17
  • 65
  • 72
Sorath
  • 543
  • 3
  • 10
  • 32
  • It's because an aggregate function **has** to be used. In many cases there is only one value, `max` selects that one record, while still following pivot syntax rules. – HoneyBadger Jan 03 '18 at 13:22
  • As an aside `where datename(year, DATE) = 2018 or datename(year,DATE)=2019` isn't ideal, as it could result in a [non-SARGable](https://stackoverflow.com/q/799584/314291) query plan, i.e. any potential indexes on column `DATE` won't be hit. – StuartLC Jan 03 '18 at 14:06
  • It would be desirable to have something like a `SINGLE_VALUE` aggregate in SQL for precisely these situations! – Steve Jan 03 '18 at 14:43

2 Answers2

6

Because in your example you've chosen EVENT as the value to show in the PIVOT intersections (i.e. since you've specified EVENT in the PIVOT clause), the value must be specified with one of the permissible aggregate functions, as there are potentially multiple rows for each of the column values that you've chosen in your pivot, when grouped by the remaining columns (i.e. DATE in your case).

In Sql Server[1], MAX() or MIN() is commonly used when pivoting non-numeric columns, as it is able to show one of the original of the values of the column.

Any non-aggregate and non-pivoted columns will be left as-is and will be used to form the groups on which the pivot is based (in your case, column DATE isn't either in the aggregate, or the column pivot, so it will form the row group)

Consider the case where your pivoted table contains multiple rows matching your predicate, such as this:

INSERT INTO strategy (DATE, SITA, EVENT) VALUES
('1 Jan 2018', 'ABZPD', 'Event1'),
('1 Jan 2018', 'BFSPD', 'Event2'),
('1 Jan 2018', 'BFSPD', 'Event3');

After Pivot:

DATE                    ABZPD   BFSPD
2018-01-01T00:00:00Z    Event1  Event3

i.e. During the Pivot, the BFSPD rows for Event2 and Event3 needed to somehow be projected into a single cell - hence the need for an aggregate. This aggregate is still needed, even if there is known to be just one value (this being the case for the Event1 value for SITA ABZPD in the above example).

Since BFSPD has two events, you'll need to somehow resolve how to project a value into a single cell value. The use of MAX on the VARCHAR column resolves the 'largest' value (Event3) in the event of multiple rows in projecting into the same resulting pivot 'cell' - SqlFiddle example here

You could choose to use COUNT(Event) to show you the number of events per row / pivot intersection - Fiddle

And you could switch the aggregate on EVENT with DATE - EVENT is thus used in the column grouping.


*1 Aggregates like AVG or STDEV are obviously not available to strings. Other RDBMS have additional aggregates like FIRST which will arbitrarily take the first value, or GROUP_CONCAT / LIST_AGG, which can fold string values together with a delimiter. And PostGres allows you to make your own aggregate functions!. But sadly, none of this in SqlServer, hence MIN() / MAX() for now.

StuartLC
  • 104,537
  • 17
  • 209
  • 285
4

An aggregation function must be specified when using PIVOT command because the first step of the pivoting operation is a grouping operation on the column specified in the FOR clause that reduces the number of lines of the resulting tables.

The aggregation function is used to manage values for the other columns that are required in the output table.

From Technet documentation:

PIVOT rotates a table-valued expression by turning the unique values from one column in the expression into multiple columns in the output, and performs aggregations where they are required on any remaining column values that are wanted in the final output.

Here is the PIVOT command syntax taken from the same Technet article:

SELECT <non-pivoted column>,  
    [first pivoted column] AS <column name>,  
    [second pivoted column] AS <column name>,  
    ...  
    [last pivoted column] AS <column name>  
FROM  
    (<SELECT query that produces the data>)   
    AS <alias for the source query>  
PIVOT  
(  
    <aggregation function>(<column being aggregated>)  
FOR   
[<column that contains the values that will become column headers>]   
    IN ( [first pivoted column], [second pivoted column],  
    ... [last pivoted column])  
) AS <alias for the pivot table>  
<optional ORDER BY clause>;  

Please note that after the PIVOT clause you must specify an aggregation function:

...
<aggregation function>(<column being aggregated>)
...

For an additional insight on this topic see also this Microsoft Press article.

Andrea
  • 11,801
  • 17
  • 65
  • 72