MySQL many-to-many JOIN returning duplicates

Question

I have two three tables. users, jobs and users_jobs. One user can have many jobs and one job can have many users.

Here is my users table:

+----+------+--------+
| ID | Name | gender |
+----+------+--------+
|  1 | Bob  |   male |
|  2 | Sara | female |
+----+------+--------+

my jobs table:

+----+----------+
| id |  job_id  |
+----+----------+
|  1 | Engineer |
|  2 | Plumber  |
|  3 | Doctor   |
+----+----------+

users_jobs table:

+---------+--------+
| user_id | job_id |
+---------+--------+
|       1 |      1 |
|       1 |      2 |
|       1 |      3 |
|       2 |      1 |
+---------+--------+

As an example, i want to select all males and check if they have at least 1 job and then return their info. If they have no jobs, then don't select/return that user. This is my query:

SELECT * FROM users
INNER JOIN users_jobs
ON users_jobs.user_id = users.id
WHERE users.gender = 'male'

But it returns Bob's info 3 times since he has 3 jobs. I don't want duplicates, how can I get rid of duplicates without using DISTINCT or GROUP BY since performance is very important here.

Thank you!

Distinct group by is used for performance ! It seem there no way else — hs-dev2 MR, Jul 12 '19 at 05:04
I'd say that you would have to use some form of ```GROUP BY```, ```DISTINCT``` or sub-query to get unique results. Out of those I would go for ```DISTINCT``` if you're going for performance. — Jerome, Jul 12 '19 at 05:05
@Jerome Ok i added `SELECT DISTINCT *` but its still showing duplicates... — sitefix, Jul 12 '19 at 05:13
Please in code questions give a [mre]--cut & paste & runnable code; example input with desired & actual output (including verbatim error messages); tags & clear specification & explanation. That includes the least code you can give that is code that you show is OK extended by code that you show is not OK. (Debugging fundamental.) PS What *exactly* is a "duplicate"? — philipxy, Jul 12 '19 at 05:16
You haven't given everything I summarized, so I don't know why you think you have given a [mre]. PS What *exactly* do you mean by "duplicate"? PS "Performance" doesn't mean anything in particular, and unless you define exactly the tradeoffs involved in dealing with it you haven't asked an answerable question & anyway you need to learn a lot more about querying before you worry about it. — philipxy, Jul 12 '19 at 05:26
@philipxy I'm not sure why you don't understand the question when everyone else has. I have my `SELECT` query and the output that it is giving me (duplicate results since Bob has 3 jobs). I want to get rid of the duplicates like I said, without using DISTINCT or GROUP BY. And performance meaning speed. in another question, someone was saying to try to avoid using distinct and group by since they can bog down performance. Was posting this question in hopes someone had a better idea (maybe change up the `WHERE` clause?) to removing duplicates. — sitefix, Jul 12 '19 at 05:32
Judging from the answers & their comments, if this were clear it would be a faq. Read about how group by & aggregation work. [Error related to only_full_group_by when executing a query in MySql](https://stackoverflow.com/q/34115174/3404097) Before considering posting please always google any error message & many clear, concise & precise phrasings of your question/problem/goal, with & without your particular strings, names & line numbers & then read many answers. If you post a question, use one phrasing as title. See [ask] & the voting arrow mouseover texts. — philipxy, Jul 12 '19 at 10:14

score 3 · Answer 1 · answered Jul 12 '19 at 05:36

MySQL allows you to do one a little odd thing, you can select more columns than what's in the GROUP BY clause and aggregate functions (this is not allowed in most other SQL engines). While this sometimes can produce unexpected results, it can work if you don't select data which can appear in multiple rows in the resulting query.

So, for your question - the query WILL return multiple rows for the same user, as some of them have many jobs (busy life, huh?). You generally can't get all their jobs in a single row, as each row is the user's data + their jobs - that's what we JOIN on. But that's not entirely true - you can use GROUP BY and GROUP_CONCAT() to concat all the other data into a single string. I wouldn't generally recommend it, but if its what you need...

SELECT u.Name, GROUP_CONCAT(j.job_id SEPARATOR ', ') as jobs
FROM users u
JOIN users_jobs uj
  ON u.ID = uj.user_id
JOIN jobs j 
  ON j.id = uj.job_id
GROUP BY u.ID

This would return

Name    | jobs
--------+-------------------------------
Bob     | Engineer, Plumber, Doctor
Sara    | Engineer

If you only want males, add in the where clause,

SELECT u.Name, GROUP_CONCAT(j.job_id SEPARATOR ', ') as jobs
FROM users u
JOIN users_jobs uj
  ON u.ID = uj.user_id
JOIN jobs j 
  ON j.id = uj.job_id`
WHERE u.gender = 'male'
GROUP BY u.ID

See live fiddle at http://sqlfiddle.com/#!9/df0afe/2

Cool idea! I don't think it would effect performance too much either. — Jerome, Jul 12 '19 at 05:45
In the end, it all depends on what you need to achieve - if the goal is to output it as the query returns it with `GROUP_CONCAT()` - great! Though I generally would just return the data as it comes from the normally joined query (without `GROUP BY`) and format it and output as needed in PHP rather than SQL. That way you get to keep all the data for each row. But as I said, depends on what you're trying to achieve. :-) — Qirel, Jul 12 '19 at 05:47

score 2 · Answer 2 · answered Jul 12 '19 at 06:23

For this it may will help you, You can use "Limit" keyword to limit the amount of records fetched

"SELECT * FROM users
        INNER JOIN users_jobs
        ON users_jobs.user_id = users.id
        WHERE users.gender = 'male'" limit 1;

        May this will help you!
        Thanks!

score 0 · Accepted Answer · edited Jul 12 '19 at 06:01

0

To follow on from the comments, for performance, it's necessary to use a distinct in your query, try:

SELECT DISTINCT Name FROM users
INNER JOIN users_jobs
ON users_jobs.user_id = users.id
WHERE users.gender = 'male'

If you're looking to get all the columns but keep the id's distinct you can use a GROUP BY, try:

SELECT * FROM users
INNER JOIN users_jobs
ON users_jobs.user_id = users.id
WHERE users.gender = 'male'
GROUP BY users.id

Although this will also effect performance, it depends on what you prioritize the most.

edited Jul 12 '19 at 06:01

DColt

143
1
5

answered Jul 12 '19 at 05:16

Jerome

734
1
8
28

If I do `SELECT DISTINCT users.name...`, it will only return the name. But I want to select all the info but only distinct `users.id` – sitefix Jul 12 '19 at 05:22
1

How do you expect it to return everything, but not everything? You either return all the rows, or or just some rows - but if you only return some rows, you will not see the data you lost (i.e., if you `GROUP BY users.id`, you will not see all their jobs). When you fetch the data in PHP, you can change it there to only display the user ID once. Alternatively, you can use `GROUP_CONCAT()`, but generally it could be better to just process the data in PHP and output it as you need it. – Qirel Jul 12 '19 at 05:28
By everything, I mean all the columns. But yeah you will absolutely lose information by doing a group by. I'll update it now. – Jerome Jul 12 '19 at 05:29
1

I did `SELECT DISTINCT users.* FROM...` and that worked. looks like using `DISTINCT` or `GROUP BY` is the only way... – sitefix Jul 12 '19 at 05:33
Maybe you should begin from reading why the join is called JOIN. Some basic algebra tutorial: {A} join {B} means.. go wonder, joined {A, B}. This is not a "duplicate", it's the data you requested. If Bob has three jobs, you get {Bob, Job A}, {Bob, Job B} and {Bob, Job C}, how else could this work? Distinct wouldn't work in this case because those are 3 different data sets. – Mike Doe Jul 12 '19 at 06:08

MySQL many-to-many JOIN returning duplicates

3 Answers3