Complicated SQL Query About Joining And Limitting

Question

SELECT DISTINCT users.id as expert_id, users.firstname, users.lastname
    , projects.id as project_id, projects.project_title
    , projects.project_budget, projects.created as project_created
FROM USERS
RIGHT JOIN expert_skills ON expert_skills.expert_id = users.id
JOIN project_skills ON project_skills.skill_id = expert_skills.skill_id
JOIN projects ON projects.id = project_skills.project_id
WHERE projects.status = 1

This query brings me distinct projects which are related to user but i want to limit project count per expert. For example i want projects that related to expert but project count can be 10 maximum. I need a limit implement to my query. How can i done this ? Thank You

You should include your table schema, expected output format as so on ... — ajreal, Dec 29 '10 at 02:59
Do you mean: you want the query to only show up to 10 projects for each user? or to only show users with 10 projects or less? — Lynette Duffy, Jan 10 '11 at 20:07
You might find this [post](http://forums.mysql.com/read.php?20,128132,128134#msg-128134) from the MySQL forums to be helpful. — Kyle, Dec 29 '10 at 02:19

score 0 · Answer 1 · edited Jan 18 '21 at 12:31

This would be really easy if MySQL had row_number, but it doesn't.

So you'll need to modify Andomar's answer to a similar question instead.

 set @num := 0, @projectid:= -1;


 SELECT DISTINCT users.id as expert_id, users.firstname, users.lastname, projects.id as                project_id, projects.project_title, projects.project_budget, projects.created as project_created
 FROM 
     USERS
        RIGHT JOIN expert_skills ON expert_skills.expert_id = users.id
        JOIN project_skills ON project_skills.skill_id = expert_skills.skill_id
        JOIN projects ON projects.id = project_skills.project_id
    JOIN
    (select  
            users.id uid,
           project.id pid,
           @num := if(@projectid= projects.id, @num + 1, 1) 
                as row_number,
            @projectid:= project.id
    from    USERS
        RIGHT JOIN expert_skills ON expert_skills.expert_id = users.id
        JOIN project_skills ON project_skills.skill_id = expert_skills.skill_id
        JOIN projects ON projects.id = project_skills.project_id
    WHERE projects.status = 1
    order by 
            users.id, project.id desc
    ) as projectNum 
    on users.id = pid
    and projects.id = pid
   where   projectNum.row_number <= 10

see this query in data.stackoverflow.com for how to do it in DB's that support windowing

Thanks for windowing information but i can't get work this query. It returns this error: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'select users.id uid, project.id pids, @num :' at line 8 — mTuran, Dec 29 '10 at 04:30
sorry about that forgot the join and one of the tables names was wrong. — Conrad Frix, Dec 29 '10 at 06:04
I fix some mistakes(project.id to projects.id) and now query works but it returns zero row every time. — mTuran, Dec 29 '10 at 07:30

score 0 · Answer 2 · answered Jan 04 '11 at 13:14

Limiting by subselect with rownum should work just fine

SELECT DISTINCT users.id as expert_id, users.firstname, users.lastname, projects.id as project_id, projects.project_title, projects.project_budget, projects.created as project_created FROM USERS RIGHT JOIN expert_skills ON expert_skills.expert_id = users.id JOIN project_skills ON project_skills.skill_id = expert_skills.skill_id JOIN (select * from (select a.project_id, additional_rows from projects) a where rownum < 11) projects_skills ON projects.id = project_skills.project_id WHERE projects.status = 1

Steven de Salas · Answer 3 · 2011-01-09T02:25:26.850

Its possible using a combination of stored procedures, in-memory tables and control flow using a WHILE loop:

DELIMITER $$

DROP PROCEDURE IF EXISTS `Get10ProjectsPerUserByStatus`$$
-- 1. DEFINE STORED PROCEDURE
CREATE PROCEDURE `Get10ProjectsPerUserByStatus`(_status INT)
BEGIN
    -- 2. DECLARE VARIABLES AND IN-MEMORY TABLES
    DECLARE _id INT;
    DROP TABLE IF EXISTS temp_user;
    DROP TABLE IF EXISTS temp_project_user;
    CREATE TABLE temp_user (id INT) ENGINE=MEMORY;
    CREATE TABLE temp_project_user (p_id INT, u_id INT) ENGINE=MEMORY;

    -- 3. ADD ALL USERS AND LOOP BY REMOVING 1 USER AT A TIME
    INSERT INTO temp_user SELECT id FROM users;
    WHILE (SELECT COUNT(*) FROM temp_user) > 0 DO
        SET _id = (SELECT MIN(id) FROM temp_user);
        INSERT INTO temp_project_user
        SELECT ps.project_id, _id
        FROM project_skills ps 
        JOIN expert_skills es ON ps.skill_id = es.skill_id
        WHERE es.expert_id = _id
        LIMIT 10;
        DELETE FROM temp_user WHERE id = _id;
    END WHILE;

    -- 4. SELECT FROM IN-MEMORY TABLE AND JOIN TO EXISTING SCHEMA
    SELECT DISTINCT t.u_id AS expert_id,
    u.firstname, 
    u.lastname,
    t.p_id AS project_id, 
    p.project_title,
    p.project_budget, 
    p.created AS project_created
    FROM temp_project_user t
    INNER JOIN users u ON u.id = t.u_id
    INNER JOIN projects p ON p.id = t.p_id;

    -- 5. DROP IN-MEMORY TABLES
    DROP TABLE temp_user;
    DROP TABLE temp_project_user;
    END$$

DELIMITER ;

-- 6. CALL STORED PROCEDURE & DROP WHEN FINISHED
CALL Get10ProjectsPerUserByStatus(1);

DROP PROCEDURE IF EXISTS `Get10ProjectsPerUserByStatus`;

NOTE: Going beyond the question here but it is actually possible to apply the 10 project limit as a parameter of the stored procedure by using the same logic:

-- GET 10 PROJECTS PER USER WITH STATUS = 1
CALL GetProjectsPerUserByStatus(1, 10)

To do this you'd need 1 additional in-memory table and another WHILE loop

John Petrak · Answer 4 · 2011-01-10T10:11:59.243

How about something like this.

SELECT  DISTINCT 
  users.id as expert_id, 
  users.firstname, 
  users.lastname, 
  myProjects.project_id, 
  myProjects.project_title, 
  myProjects.project_budget, 
  myProjects.project_created
FROM USERS RIGHT JOIN 
  expert_skills ON expert_skills.expert_id = users.id JOIN 
  project_skills ON project_skills.skill_id = expert_skills.skill_id JOIN 
  (SELECT P1.id as project_id, 
    P1.project_title, 
    P1.project_budget, 
    P1.created as project_created  
  FROM projects AS P1
  WHERE   P1.status = 1 AND 
    (SELECT COUNT(*)
    FROM projects AS P2 
    WHERE P2.ID <= P1.ID) <= 10) AS myProjects ON myProjects.id = project_skills.project_id

Vinicius · Answer 5 · 2011-01-10T17:12:23.450

0

Try this:

SELECT 
  users.id as expert_id, users.firstname,
  projects.id as project_id, projects.project_title,
  count(distinct users.id, ps2.project_id)
FROM USERS
  JOIN expert_skills ON expert_skills.expert_id = users.id
  JOIN project_skills ON project_skills.skill_id = expert_skills.skill_id
  JOIN projects ON projects.id = project_skills.project_id 
  JOIN expert_skills es2 ON es2.expert_id = users.id
  LEFT JOIN project_skills ps2 ON ps2.skill_id = es2.skill_id and ps2.project_id < projects.id
WHERE projects.status = 1
group by users.id, users.firstname, projects.id, projects.project_title
having count(distinct users.id, ps2.project_id) < 10
order by users.id, projects.id

Assuming that any subset of 10 projects are valid, otherwise you will have to adjust the criteria.

I removed some fields from the select just for the sake of clarity.

edited Jan 10 '11 at 17:12

answered Jan 10 '11 at 16:57

Vinicius

1,060
1
12
21

1

Isn't it filtering out expects that have more than 10 projects instead of returning their 10 projects? – Schultz9999 Jan 11 '11 at 01:11
@Schultz999 No, it is not filtering out users that have more than 10 projects. Pay attention to the fact that there are 2 expert_skills tables and 2 project_skills tables, I use them to COUNT the projects per user and use that count as a limiting factor. The trick is the clause "ps2.project_id < projects.id", it makes the "count(distinct users.id, ps2.project_id)" act like a counter per user. – Vinicius Jan 13 '11 at 00:21
To be more explicit, without the group by/having the results would show each (user, project) tuple repeated N times, where N is the number of other (user, project) tuples that have an project.id less than the main project.id. When you group the count(...) will show up like 0, 1, 2, 3... for each user at the correct order, reseting for the next user. With the having clause the (user,project) tuples with count > 10 are discarded. – Vinicius Jan 13 '11 at 00:34

score 0 · Answer 6 · answered Mar 03 '12 at 23:54

I think what you need is a subquery with a TOP 10 in the inner one.

Something like this (I don't really get the query, so it may not be exactly what you need)

SELECT users.id as expert_id, users.firstname, users.lastname, projs.*
  FROM Users
       RIGHT JOIN 
       (SELECT DISTINCT TOP 10 projects.id as project_id, projects.project_title, projects.project_budget, projects.created as project_created
       FROM expert_skills 
            right JOIN project_skills ON project_skills.skill_id = expert_skills.skill_id
            right JOIN projects ON projects.id = project_skills.project_id
      WHERE projects.status = 1 ) projs ON expert_skills.expert_id = users.id

Doug · Answer 7 · 2012-03-05T21:17:57.170

Most Databases have a way to ask for the first X rows of the query. In DB2 you can use the following statement to return the first 10 rows.

select * from tableName fetch first 10 rows only;

I believe that in MySql you use the limit key word.

select * from tableName limit 10;

This wont work if your wanting only people who have less than or equal to 10 projects. I wasn't clear if you wanted to get the ones who had 10 or less projects or if you wanted to only return the first 10 projects.

score 0 · Answer 8 · answered Mar 08 '12 at 14:55

I would start in reverse... get the projects first, then find those skills and who has those skills, tack on a sequence counter per "expert". From that result set, just apply a where clause to exclude beyond your "cut-off". By using @ mysql variables, and using an ORDER BY clause, you get the results returned in order of the expert ID, so when setting them, see if the same expert. If so, add one to the sequence counter. If different, set to zero... THEN, update who the last expert was for its comparison on the next record in the result set.

SELECT
      PreQuery.expert_id, 
      PreQuery.firstname, 
      PreQuery.lastname, 
      PreQuery.Project_ID,
      PreQuery.Project_Title,
      PreQuery.Project_Budget,
      PreQuery.Project_Created,
      PreQuery.LastSeq
   from 
      ( SELECT DISTINCT
              U.id as expert_id, 
              U.firstname, 
              U.lastname, 
              P.ID as Project_ID,
              P.Project_Title,
              P.Project_Budget,
              P.Created as Project_Created,
              @Seq := if( @LastExpert = U.ID, @Seq +1, 1 ) LastSeq,
              @LastExpert := U.ID as IgnoreThis
           from
              Projects P
                 JOIN Project_Skills PS
                    ON P.ID = PS.Project_ID
                    JOIN Expert_Skills ES
                       ON PS.Skill_ID = ES.Skill_ID
                       JOIN Users U
                          ON ES.Expert_ID = U.ID,
              (select @Seq := 0, @LastExpert = 0 ) SQLVars   
           where
              P.Status = 1
           order by 
              U.ID ) PreQuery
   where
      PreQuery.LastSeq < 11

score -1 · Answer 9 · answered Jan 10 '11 at 10:30

How about this...

SELECT users.id as expert_id, users.firstname, users.lastname, projects.id as project_id, projects.project_title, projects.project_budget, projects.created as project_created
FROM USERS
RIGHT JOIN expert_skills ON expert_skills.expert_id = users.id
JOIN project_skills ON project_skills.skill_id = expert_skills.skill_id
JOIN projects ON projects.id = project_skills.project_id
WHERE projects.status = 1
GROUP BY users.id as expert_id, users.firstname, users.lastname, projects.id as project_id, projects.project_title, projects.project_budget, projects.created
HAVING COUNT(*) < 10

This is assuming that your original query does exactly what you want but you just want to limit the results to where your returned rows are less than 10.

Probably not as efficient as other posts here and I may have the wrong end of the stick but worth my 2 cents :)

Grouping by all the output fields will return the same result unless there are duplicate records. The question is about returning top 10 projects for each expert, not the same projects created more than 10 times. — Schultz9999, Jan 11 '11 at 01:07

score -1 · Answer 10 · edited May 23 '17 at 10:29

I suggest using DENSE_RANK. I don't have your table structure so I have created my own, user and project table with 1 to many relationship. So the query to select last 2 projects for the user is this (note, you may put that into a stored proc and use the number as an input parameter):

declare @numberOfProjects int
set @numberOfProjects = 3

;with topNProjects(userid, projectid, createdtutc, dense_rank)
as (
    select p.userid, P.ProjectId, p.CreatedDtUtc, DENSE_RANK() OVER (PARTITION BY P.UserId ORDER BY P.ProjectId) AS DENSE_RANK
    from DS_Project P
)
select userid, projectid, createdtutc from topNProjects
where dense_rank <= @numberOfProjects
order by projectid desc

DENSE_RANK effectively returns the number records before the current one + 1 within the grouped by fields.

EDITED: I was pointed out this was the problem for MySQL, not MSSQL. I don't have MySQL installed but apparently ranking is not there yet (or ever). I did some search on ranking and I found this SO post which may help to answer the original question: How to perform grouped ranking in MySQL. So once ranking is added, the logic stays the same - select all records that have the rank less than a sought number of projects per user.

Close, but no cigar, as far as I can see. He asked for MySQL, this is MSSQL and possibly others — data, Jan 11 '11 at 03:17
@data: yeah, you're right... didn't expect MySql was lacking ranking functions. Now I see the bug for that opened in '06 and is still opened. — Schultz9999, Jan 11 '11 at 05:28
If you down vote me, please explain why so that I wouldn't make mistakes answering other questions. — Schultz9999, Jan 11 '11 at 23:37

Complicated SQL Query About Joining And Limitting

10 Answers10

Linked