0

In my table with artist, title columns, I want to search for all unique artist occurencies and count rows by artist (rows can be duplicated). The point is that songs are being performed by duos or even bigger groups and I can't handle such cases. A small part of table as an example:

artist                                  title
Beyoncé                                 Halo
Lady GaGa                               Paparazzi
Lady GaGa                               Poker Face
Lady GaGa                               Poker Face
Lady GaGa & Beyoncé                     Telephone
Rihanna                                 Disturbia
Rihanna & Kanye West & Paul McCartney   FourFiveSeconds

My current query is:

SELECT artist, COUNT(*) AS total
FROM (
    SELECT artist
    FROM list
    GROUP BY artist, title
) AS a
GROUP BY artist
ORDER BY total DESC

Result:

artist      total
Lady GaGa   2
Beyoncé     1
Rihanna     1

My desired result is, respectively, 3, 2 and 2 because both Beyoncé and GaGa have a duet song that is not counted yet and Rihanna has a trio song. Kanye West and Paul McCartney have no solo songs in the example, so I don't want them to be counted. I should probably use REGEXP operator somehow... Any help would be appreciated.

kuznikowski
  • 513
  • 1
  • 5
  • 16
  • 3
    Normalize the schema. Use a table for the single artists and a table for the songs. Have a third table connection artists and songs by storing foreign key to them. Then you can join the three together and group by the artist. – sticky bit May 12 '18 at 16:43
  • You should be able to use a temp table for splitting a string. Your stored procedure would just have to add a loop to process each row in your list table. This SO answer is a great start https://stackoverflow.com/questions/11835155/mysql-split-comma-separated-string-into-temp-table – dbmitch May 12 '18 at 17:20

4 Answers4

1

This would be the brute force approach:

SELECT Artist, COUNT(DISTINCT title) AS Titles
FROM
(
SELECT FirstArtist AS Artist, title
FROM
(SELECT
 SUBSTRING_INDEX(artist, ' & ', 1) AS FirstArtist,
 title
 FROM list) sub
UNION ALL
SELECT SecondArtist, title
FROM
(SELECT
 SUBSTRING_INDEX(SUBSTRING_INDEX(artist, ' & ', 2), ' & ', -1) AS SecondArtist,
 title
 FROM list) sub
UNION ALL
SELECT ThirdArtist, title
FROM
(SELECT
 SUBSTRING_INDEX(artist, ' & ', -1) AS ThirdArtist,
 title
 FROM list) sub
) unpiv
GROUP BY Artist
ORDER BY Titles DESC

I'm sure you can do this with a cte or temp table but I am not familiar enough with MySQL for that. The key in either case is to first unpivot the data and then do your summary.

kjmerf
  • 4,275
  • 3
  • 21
  • 29
  • Should work great as long as the maximum number of artists is limited to three – dbmitch May 13 '18 at 04:43
  • It's almost perfect, however it also counts the non-solo artists (Kanye West and Paul McCartney in the example) and as dbmitch noticed, it is limited to 3 artists – kuznikowski May 13 '18 at 07:10
1

A managed to cobble together SQL that should work for your example based on the small data set in your example.

As I mentioned in my earlier comment the stored procedure splits the artist string and inserts into a temp table. From there it's a simple count and group query like you have above.

After creating the function and procedures you should be able to execute the stored procedure listArtistCounts to get your results

My SO sources are listed just above the code.

SPLIT_STR function

source

delimiter $$

CREATE FUNCTION SPLIT_STR(
  x VARCHAR(255),
  delim VARCHAR(12),
  pos INT
)
RETURNS VARCHAR(255)
RETURN REPLACE(SUBSTRING(SUBSTRING_INDEX(x, delim, pos),
       LENGTH(SUBSTRING_INDEX(x, delim, pos -1)) + 1),
       delim, '');
$$

AddSplitRecords procedure

delimiter $$
CREATE PROCEDURE AddSplitRecords(fullstr)

   BEGIN
      DECLARE a INT Default 0 ;
      DECLARE str VARCHAR(255);
      simple_loop: LOOP
         SET a=a+1;
         -- Split based on " & "
         SET str=SPLIT_STR(fullstr," & ",a);
         IF str='' THEN
            LEAVE simple_loop;
         END IF;

         #Do Inserts into temp table here with str going into the row
         insert into temp_table values (str);
   END LOOP simple_loop;
END $$

listArtistCounts Procedure

source

create procedure listArtistCounts()
begin
-- Variable to hold artist field from query
declare cArtistList varchar(255);

-- Variables related to cursor:
--    1. 'done' will be used to check if all the rows in the cursor 
--       have been read
--    2. 'curArtist' will be the cursor: it will fetch each row
--    3. The 'continue' handler will update the 'done' variable

declare done int default false;
declare curArtist cursor for
    SELECT artist FROM list GROUP BY artist, title; -- This is the query used by the cursor.

declare continue handler for not found -- This handler will be executed if no row is found in the cursor (for example, if all rows have been read).
    set done = true;

CREATE TEMPORARY TABLE IF NOT EXISTS temp_table(artist varchar(255) NOT NULL);

-- Open the cursor: This will put the cursor on the first row of its rowset.
open curArtist;

-- Begin the loop (that 'loop_artist' is a label for the loop)
loop_artist: loop
    -- When you fetch a row from the cursor, the data from the current
    -- row is read into the variables, and the cursor advances to the
    -- next row. 
    -- If there's no next row, the 'continue handler for not found'
    -- will set the 'done' variable to 'TRUE'
    fetch curArtist into cArtistList;

    -- Exit the loop if you're done
    if done then
        leave loop_artist;
    end if;

    -- Execute insert from row.
    AddSplitRecords cArtistList

end loop;

-- Close the cursor when done
close curArtist;

-- Output results of distinct artists counted
SELECT artist,  COUNT(*) AS total FROM temp_table GROUP BY artist ORDER BY total DESC

end $$
delimiter ;
dbmitch
  • 5,361
  • 4
  • 24
  • 38
  • Well, it turns out I cannot use procedures and functions on my free hosting :/ Thank you for your involvement, the answer is probably correct (one note: splitting the artist should be much simpler: "SELECT DISTINCT artist FROM list") but all I can do is to upvote it. – kuznikowski May 13 '18 at 07:17
  • Thanks for the upvote - not being able to do procedures/functions is a key fact that should be included in the question. uot sure about your note - You can't just split based on DISTINCT artist - because there may be the same artist in different titles. – dbmitch May 13 '18 at 16:22
0

Do the fact you have not a nomalized schema for the table list your expected result seems hard to find ..

you could try an inner join on like for get (at least) the result similar artis on different rows

  select a.artist, t.my_num 
  from list 
  inner join (
  select artist, count(distinct title) my_num 
  from list 
  group by artist ) t on t.artist like concat('%', a.artist, '%')
ScaisEdge
  • 131,976
  • 10
  • 91
  • 107
0

I found one possibility without using procedures, it is to create a temporary table and then update the total column. The result is correct, but it takes a lot of time:

CREATE TEMPORARY TABLE IF NOT EXISTS listUnique AS (SELECT artist FROM list GROUP BY artist, title);

CREATE TEMPORARY TABLE IF NOT EXISTS list2 AS (SELECT DISTINCT artist FROM listUnique);
ALTER TABLE list2 ADD total SMALLINT; 
UPDATE list2 SET total = (SELECT COUNT(*) FROM listUnique WHERE artist = list2.artist OR
artist REGEXP CONCAT(" & ", list2.artist) OR
artist REGEXP CONCAT(list2.artist, " & "));
SELECT artist FROM list2 ORDER BY total DESC
kuznikowski
  • 513
  • 1
  • 5
  • 16
  • That looks like it would work - don't know how you'd make it faster – dbmitch May 13 '18 at 16:25
  • `SELECT DISTINCT artist FROM listUnique` - you're still storing the full artist name with multiple artists. You're only splitting by REGEXP not storing the split values in the table – dbmitch May 13 '18 at 16:32