Pivoting table in SQL to columns with maximum number of unique observation

Question

I have a sql table:

+----+--------------+-----------+
| ID |    Genre     | Frequency |
+----+--------------+-----------+
|  1 | Rock         | 0.11      |
|  1 | Punk         | 0.22      |
|  1 | Glam         | 0.33      |
|  2 | Metal        | 0.44      |
|  2 | Heavy Metal  | 0.55      |
|  2 | Thrash Metal | 0.66      |
+----+--------------+-----------+

I know how pivot this to:

+----+------+------+------+-------+-------------+--------------+
| ID | Rock | Punk | Glam | Metal | Heavy Metal | Thrash Metal |
+----+------+------+------+-------+-------------+--------------+
|  1 | 0.11 | 0.22 | 0.33 | NULL  | NULL        | NULL         |
|  2 | NULL | NULL | NULL | 0.44  | 0.55        | 0.66         |
+----+------+------+------+-------+-------------+--------------+

but I want table in this format:

+----+---------+-------------+-------------+-------------+--------------+-------------+
| ID | genre 1 | frequency 1 |   genre 2   | frequency 2 |   genre 3    | frequency 3 |
+----+---------+-------------+-------------+-------------+--------------+-------------+
|  1 | Rock    | 0.11        | Punk        | 0.22        | Glam         | 0.33        |
|  2 | Metal   | 0.44        | Heavy Metal | 0.55        | Thrash Metal | 0.66        |
+----+---------+-------------+-------------+-------------+--------------+-------------+

This may seem weird because in most data science case, we want to have a "meaning" to the a particular column. But in this case genre 1 does not have any "meaning". But the reason I want it in this format is because, I have a lot of artist with vary different genres. So if I have 1000 different genre and I pivot in the usual way - I will end up with 1000 different columns, one for each genre. A lot of them will be Null though.

But, In my way of pivoting, even if I have 1000 different genre, column number will be "number of genre for the artist with highest number of genre". So, in my example, if I add another artist with 04 totally different genre, I will just have to add another column (instead of 04).

Additionally, If I could sort by frequency within each individual artist, the variable genre 1 will actually have meaning. It will mean "the most frequent genre".

Please note, I am not allowed to create any new column in the main database.

So is your database MySQL or SQL Server? Please remove the incorrect tag — Dale K, Nov 30 '18 at 00:04

LukStorms · Accepted Answer · 2018-12-04T08:42:47.737

0

It can be done via PIVOT, but it ain't pretty.

WITH CTE AS
(
   SELECT ID, Genre, CAST(Frequency AS VARCHAR(30)) AS  Frequency,
    ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Frequency) AS rn
   FROM ArtistGenres
)
SELECT *
FROM
(
     SELECT ID, CONCAT(ca.Name,' ',rn) AS Col, ca.Val
     FROM CTE
     CROSS APPLY (VALUES ('genre', Genre),('frequency', Frequency)) ca(Name, Val)      
) src
PIVOT
(
  MAX(Val)
  FOR Col IN ([genre 1], [frequency 1], [genre 2], [frequency 2], [genre 3], [frequency 3])
) pvt;

Or via conditional aggregates:

SELECT ID,
MAX(CASE WHEN rn = 1 THEN Genre END) AS [genre 1],
MAX(CASE WHEN rn = 1 THEN Frequency END) AS [frequency 1],
MAX(CASE WHEN rn = 2 THEN Genre END) AS [genre 2],
MAX(CASE WHEN rn = 2 THEN Frequency END) AS [frequency 2],
MAX(CASE WHEN rn = 3 THEN Genre END) AS [genre 3],
MAX(CASE WHEN rn = 3 THEN Frequency END) AS [frequency 3]
FROM
(
   SELECT ID, Genre, Frequency, 
    ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Frequency) AS rn
   FROM ArtistGenres
) q
GROUP BY ID
ORDER BY ID;

db<>fiddle here

edited Dec 04 '18 at 08:42

answered Nov 30 '18 at 00:27

LukStorms

28,916
5
31
45

@ImrulHuda For completeness sake I added an extra method. There was another answer that used it, but it got deleted. It has 1 benefit over the PIVOT. The frequencies are returned in their original data type. – LukStorms Dec 03 '18 at 20:26
Thanks. Also, there are actually 45 genre, frequency combination. I extended your method and manually typed in the whole 45 set of genre, frequency. It worked. Just wondering whether there is a better method. – Imrul Huda Dec 04 '18 at 21:03
@ImrulHuda In MS Sql there's no way to pivot in a normal SQL without declaring the column names. But you can by using a Dynamic Sql. Basically building a sql statement in a string and then executing it. F.e. in [this old post](https://stackoverflow.com/a/52226502/4003419) I did such a thing. – LukStorms Dec 04 '18 at 21:55
Well ok, there's also methods in the newer MS Sql Server versions to collect grouped data in an [XML](https://learn.microsoft.com/en-us/sql/relational-databases/xml/create-xml-data-type-variables-and-columns) or [JSON](https://learn.microsoft.com/en-us/sql/relational-databases/json/json-data-sql-server) data type. But that wouldn't get your expected output. It's more usefull for those who then need to display & format it on a front-end. – LukStorms Dec 04 '18 at 22:13

Pivoting table in SQL to columns with maximum number of unique observation

1 Answers1