MySQL groupwise maximum

Question

EDIT UPDATE: Turns out I have version 5.7 so Window Functions are not an option in order to find a solution.

SHOW VARIABLES LIKE 'version';
+---------------+------------+
| Variable_name | Value      |
+---------------+------------+
| version       | 5.7.21-log |
+---------------+------------+

Description of the problem: I have a ternary relationship table between offers, skills and profiles. That ternary relationship has an attribute, ranking.

I have a skill table where I can see the skill's name. Up untill now I had to do two queries:

1) Give me the top 10 ranked skills per profile:

SELECT DISTINCT ternary.id_skill, skill.name_skill, ranking_skill
FROM ternary
INNER JOIN skill ON skill.id_skill=ternary.id_skill
WHERE ternary.id_perfil= #IntNumber#
GROUP BY ternary.id_skill
ORDER BY ternary.ranking_skill DESC
LIMIT 10;

2) For a list of ID skills, give me if they appear in any profile, and how many times they do appear.

SELECT DISTINCT ternary.id_profile, nombre_profile, COUNT(DISTINCT ternary.id_skill) AS matching
FROM ternary
INNER JOIN profile ON ternary.id_profile=profile.id_profile
WHERE ternary.id_skill= '858534430'
  OR ternary.id_skill= '3213227'
  OR ternary.id_skill= '3254818'
GROUP BY(ternary.id_profile)
ORDER BY matching DESC;

In that last query a problem has been identified: It "searches" for a skill appearing at any point for a profile. Since it is possible for a profile to have thousands of skills, it can be misleading, due to what we want to achieve I now need to only "search" when it's one of the top 10 skills of ANY profile. But only in top 10.

So far, basically I have been trying to mix both queries, with little success, because it seems I cannot make a partition over two columns, and even if I use only one, I get You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '(PARTITION BY:

SELECT *
FROM
(
   SELECT DISTINCT ternary.id_skill,
                   skill.name_skill,
                   ternary.ranking_skill,
                   ternary.id_profile,
                   ROW_NUMBER() OVER(PARTITION BY id_profile, id_skill ORDER BY ternary.ranking_skill DESC) rn
   FROM ternary
   INNER JOIN skill ON skill.id_skill=ternary.id_skill
)
WHERE rn < 11;

I have learned this operation might be called groupwise maximum, and I have seen several answers looking for this. I have not been able to replicate any of them, and I need it specifically for mysql Ver 14.14 Distrib 5.5.60, for Linux (x86_64) using readline 5. if it's of any help (I have tried answers that were perfect for some other, similar databases but would not work in mysql).

Tables definition:

CREATE TABLE `ternary` (
  `id_offer` varchar(200) NOT NULL,
  `id_skill` varchar(200) NOT NULL,
  `id_profile` varchar(200) NOT NULL,
  `ranking_skill` double NOT NULL,
  PRIMARY KEY (`id_offer`,`id_skill`,`id_profile`),
  KEY `id_skill` (`id_skill`),
  KEY `id_profile` (`id_profile`),
  CONSTRAINT `ternary_ibfk_1` FOREIGN KEY (`id_offer`) REFERENCES `offer` (`id_offer`),
  CONSTRAINT `ternary_ibfk_2` FOREIGN KEY (`id_skill`) REFERENCES `skill` (`id_skill`),
  CONSTRAINT `ternary_ibfk_3` FOREIGN KEY (`id_profile`) REFERENCES `profile` (`id_profile`)
)


CREATE TABLE `skill` (
  `id_skill` varchar(200) NOT NULL,
  `name_skill` varchar(200) DEFAULT NULL,
  `date` date DEFAULT NULL,
  PRIMARY KEY (`id_skill`)
  )

Results of doing a

select * from ternay limit 10;

+------------+------------+-----------+----------------------+
| id_oferta  | id_skill   | id_perfil | ranking_skill        |
+------------+------------+-----------+----------------------+
| 1004 | 107              | 679681082 |                    0 |
| 1004 | 115              | 679681082 |  0.10846866454897801 |
| 1004 | 117              | 679681082 | 0.038003619695992294 |
| 1004 | 129              | 679681082 |  0.04987975085098989 |
| 1004 | 147              | 679681082 |  0.02771097269499438 |
| 1004 | 299              | 679681082 |   0.0522549770819894 |
| 1004 | 321              | 679681082 |  0.11955305362697576 |
| 1004 | 417              | 679681082 |  0.11321911701097703 |
| 1004 | 964              | 679681082 | 0.015043099462996949 |
| 1004 | 967              | 679681082 |  0.05304671915898924 |
+------------+------------+-----------+----------------------+

Result of query 1) describe above, which gives me top 10 for ONE profile

+------------+--------------+---------------------+
| id_skill   | name_skill   | ranking_skill       |
+------------+--------------+---------------------+
| 109        | scala        |  0.3089840175329823 |
| 122        | hadoop       | 0.24164146109602963 |
| 9731       | python       | 0.21470443852124863 |
| 325        | java         | 0.18776741594646754 |
| 114        | sql          | 0.14736188208429596 |
| 101        | kafka        | 0.13389337079690544 |
| 301        | bbdd         | 0.13389337079690544 |
| 927        | agile        | 0.13389337079690544 |
| 320        | hive         |  0.1204248595095149 |
| 109        | spark        |  0.1204248595095149 |
+------------+--------------+---------------------+

[Windowed functions are available from MySQL 8.0](https://stackoverflow.com/a/46753800/5070879) — Lukasz Szozda, Jun 26 '18 at 07:19
@LukaszSzozda I see, so OVER(Partition BY) should be possible to do, if that is needed for the solution. Thanks! — monkey intern, Jun 26 '18 at 07:21
Your ROW_NUMBER should be partitioned only by profile: `ROW_NUMBER() OVER(PARTITION BY id_profile ORDER BY ternary.ranking_skill DESC)` — dnoeth, Jun 26 '18 at 07:26
@dnoeth I have tried both ways, and it won't work, I get the same error, which is a Syntax error, so I simply am not writing it properly — monkey intern, Jun 26 '18 at 07:32
@dnoeth I run mysql -V and I get 14.14 so I guess it is the client's version, if I run SHOW VARIABLES LIKE 'version'; I get 5.7.21-log... So you might be right, sadly — monkey intern, Jun 26 '18 at 07:37
Sure thing @D-Shih even though I am afraid it will make the post too long, I'll try! — monkey intern, Jun 26 '18 at 08:10
More on groupwise-max: http://mysql.rjweb.org/doc.php/groupwise_max — Rick James, Jul 14 '18 at 03:49

score 1 · Answer 1 · answered Jun 26 '18 at 08:32

Here is a sample for you make Row_number() with out Window Functions, you can try to write a subquery on select clause.

PARTITION BY column write condition in subquery where clause.
subquery count(*) to make Row_number

look like this.

SELECT * FROM 
(
SELECT *,(
     select (count(*) + 1) rn
     from ternary 
     where 
        t.id_profile = id_profile and 
        t.id_profile = id_profile and 
        ranking_skill > t.ranking_skill
   ) rn
  FROM ternary t
) t
WHERE rn < 11
order by rn

sqlfiddle:http://sqlfiddle.com/#!9/7ee529/9

this query might be work you can try.

SELECT *
FROM
(
  SELECT DISTINCT t.id_skill,
                   skill.name_skill,
                   t.ranking_skill,
                   t.id_profile,
                   (
                     select (count(*) + 1) rn
                     from ternary 
                     where t.id_profile = id_profile and t.id_profile = id_profile
                     and ranking_skill > t.ranking_skill
                   ) rn
   FROM ternary t
   INNER JOIN skill ON skill.id_skill=t.id_skill;
)
WHERE rn < 11;

score 1 · Answer 2 · answered Nov 07 '18 at 20:13

To speed up your first query, change

KEY `id_profile` (`id_profile`),

to

KEY `id_profile` (`id_perfil`, id_skill, id_ranking),

Don't mix DISTINCT and GROUP BY. (GroupBy effectively does Distinct.)

Where does nombre_profile come from? (It is hard to help when there are dangling column names.)

Delay fetching skill.name_skill.

Don't bother passing ranking_skill out of the subquery if it is not to be used.

Move one of the JOIN into a subquery.

Maybe this has the effect of correctly combining the two queries:

SELECT  t.id_profile,
        nombre_profile,
        ( SELECT COUNT(DISTINCT id_skill)
             FROM ternary
             WHERE id_skill = ten.id_skill
        ) AS matching
    FROM  
        (  -- Get the 10 ids:
        SELECT  t.id_skill
            FROM  ternary AS t
            INNER JOIN  skill  ON skill.id_skill = t.id_skill
            WHERE  t.id_profile = #IntNumber#
            GROUP BY  t.id_skill
            ORDER BY  t.ranking_skill DESC
            LIMIT  10 
        ) AS ten
    INNER JOIN  profile AS p  ON t.id_profile = p.id_profile AS p
    GROUP BY(t.id_profile)
    ORDER BY  matching DESC;

Thank you for your answer! I am sorry about the changing names, I often write variables in my language but when I post to SO I make an effort to translate them so they are more easily read. I did not think I had made so many mistakes so I am sorry. — monkey intern, Nov 08 '18 at 07:46

MySQL groupwise maximum

2 Answers2