0

I have the following tables:

RATINGS

User-ID     ISBN    Book-Rating
244662  0373630689  3
19378   0812515595  2
238625  0441892604  5
180315  0140439072  1
242471  3548248950  1

BOOKS ISBN Book-Title Book-Author Year-Of-Publication Publisher 0393000753 A Reckoning May Sarton 1981 W W Norton

Using MySQL, I want to create a table that looks like this:

----- User1 User2 User3 ... ISBN1 Rating11 NaN Nan ISBN2 NaN Rating21 Rating23 ISBN3 Rating31 NaN NaN ...

I've learned how to do it for a fixed number of columns here. But how to convert the rows into columns? I have tried something like

create view BX-Book-Ratings-Extended as ( select ISBN, case whenBook-Rating= "1" then 1 end asUser-ID, case whenBook-Rating= "2" then 2 end asUser-ID, case whenBook-Rating= "3" then 3 end asUser-ID, case whenBook-Rating= "4" then 4 end asUser-ID, case whenBook-Rating= "5" then 5 end asUser-ID fromBX-Book-Ratings )

which doesn't work - the view is empty... I also tried to adopt a solution from another thread, but keep getting a syntax error I can't identify:

SELECT
  GROUP_CONCAT(DISTINCT
    CONCAT(
      'ifnull(SUM(case when `Book-Rating` = ''',
      `User-ID`,
      ''' then `Book-Rating` end),0) AS ',
      `User-ID`
    )
  ) INTO @sql
FROM
  `BX-Book-Ratings`;
SET @sql = CONCAT('SELECT ISBN, ', @sql, ' 
                  FROM BX-Book-Ratings 
                   GROUP BY ISBN');

PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;

The dataset is here.

Thanks for any advice!

mirgee
  • 390
  • 2
  • 13
  • Why do you want to do this? – Strawberry Jul 21 '17 at 22:54
  • @Strawberry In order to train a recommendation system. – mirgee Jul 21 '17 at 23:01
  • It seems unlikely that such a system would be built inside MySQL – Strawberry Jul 21 '17 at 23:39
  • @Strawberry I need a database to store the data. Why not MySQL? – mirgee Jul 21 '17 at 23:42
  • How many unique users do you have? This could be quite the query. – Jacob H Jul 21 '17 at 23:50
  • You need a database to store the data. But the recommendation system is (presumably) a bunch of algorithms and some kind of friendly interface - so not a database as such. – Strawberry Jul 21 '17 at 23:50
  • @Strawberry I am not following you. The OP said they have two tables in a MySQL database... Ratings (with UserID) and Books... and they want to pivot the ratings for each user. – Jacob H Jul 21 '17 at 23:56
  • @JacobH 105 000 unique users who ever rated and 136 000 books (after grouping by title and author, about twice ISBNs). So yes, it is a few tens of GB uncompressed. – mirgee Jul 21 '17 at 23:56
  • @JacobH Yes - but I think the OP is confused about what a database is for - which parts of a given problem it's sensible for a database to solve – Strawberry Jul 21 '17 at 23:58
  • 1
    Well now that we know there would be 105,000 columns in the pivot I do agree that a pivot seems... unwieldy. I don't even know if MySQL can handle that (I'm SQL Server mostly). A plain old SELECT with a JOIN should suffice I think, as long as OP is OK with multiple rows per user. – Jacob H Jul 22 '17 at 00:03
  • @Strawberry I don't understand what your objections are. Should I not store the data in a database? Should I not use SQL for pivoting? – mirgee Jul 22 '17 at 00:03
  • @JacobH I agree. But a collaborative filtering algos need this format. Do you know of a better approach? – mirgee Jul 22 '17 at 00:06
  • You should store the data in the database. You should not use sql for pivoting – Strawberry Jul 22 '17 at 00:14
  • @Strawberry But you still agree with using pivot? So building the table row-wise from another script? – mirgee Jul 22 '17 at 00:28
  • Well I don't agree with it. But you indicated that it was something you wanted. So, processing a normal result set in application code, so as to output in whatever manner best suits your requirements. – Strawberry Jul 22 '17 at 00:36
  • And how will you display the ultra-wide line?? – Rick James Jul 22 '17 at 16:31
  • @RickJames What line do you mean? I don't need to display the table... – mirgee Jul 22 '17 at 16:45
  • Then why put them in columns? – Rick James Jul 22 '17 at 16:45

1 Answers1

1

Your second attempt with dynamic SQL (prepare/execute) was close.
Try this query:

SET @sql = NULL;

SELECT Concat(
         'SELECT ISBN, ',
         Group_Concat( x SEPARATOR ','),
         ' FROM ratings GROUP BY ISBN '
       )
INTO @sql
FROM (
  SELECT Concat('Max( CASE WHEN User_ID = ', 
                User_ID, 
               ' THEN Book_Rating END ) As User_', 
               User_ID 
         ) As x
  FROM ratings 
  GROUP BY User_ID
  ORDER BY User_ID
) x;

PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;

Demo: http://rextester.com/GLJV4738

If one user can have more ratings for one book, then replace max with sum.


Unfortunately this solution is not supposed to work for more than a copule of thousand of users (columns) due to MySql limits

  • MySQL has hard limit of 4096 columns per table (so the query resultset cannot have more than 4096 columns too)
  • MySQL table has a maximum row size limit of 65,535 bytes (more on this here), so 150.000 users in a row must for sure have more than 65 k bytes
krokodilko
  • 35,300
  • 7
  • 55
  • 79