MySQL efficient join with "in" without duplicates

Question

I have an event table

+----------+------------+
| event_id | event_name |
+----------+------------+
|        1 | event1     |
|        2 | event2     |
|        3 | event3     |
|        4 | event4     |
+----------+------------+

And an event_performer table

+--------------------+----------+--------------+
| event_performer_id | event_id | performer_id |
+--------------------+----------+--------------+
|                  1 |        1 |            1 |
|                  2 |        1 |            2 |
|                  3 |        2 |            1 |
|                  4 |        2 |            2 |
|                  5 |        3 |            3 |
|                  6 |        3 |            4 |
|                  7 |        4 |            3 |
|                  8 |        4 |            4 |
+--------------------+----------+--------------+

I want to get all the events with performer ids 1 and 2, so I run the following query:

select event.* from event
join event_performer
on event.event_id = event_performer.event_id 
and performer_id in (1,2)
order by event_name

When I do that, I obviously get duplicate events (two for event_id 1 and two for event_id 2). What's the most efficient way in MySQL remove the duplicates so that I get each event record once?

One idea is to use select distinct event.* How efficient is that for a large number of fields and records?

Note that the example tables are oversimplified. Each table has MANY more fields and MANY more records.

Rahul Tripathi · Answer 1 · 2015-10-20T05:04:36.380

0

You can try to use Group BY

select event.event_id, event.event_name from event
join event_performer
on event.event_id = event_performer.event_id 
and performer_id in (1,2)
GROUP BY event_name, event.event_id
order by event_name

However do note that if you have created an index on your column then using GROUP BY is similar to using DISTINCT. But if you have not created an index then I would recommend you to go with using DISTINCT as it is comparatively faster than GROUP BY.

edited Oct 20 '15 at 05:04

answered Oct 20 '15 at 05:02

Rahul Tripathi

168,305
31
280
331

Thank you. In my case, I am selecting all the fields in the event table (there are a lot). Should that whole list be included in both the select statement and the group by statement? – stepanian Oct 20 '15 at 05:04
When you say index, if there are many fields, what index would you be referring to? Just the primary key or a multi-column index? – stepanian Oct 20 '15 at 05:05
@stepanian:- Yes, the index on the column which you are trying to check for duplicates. You can refer: [What's faster, SELECT DISTINCT or GROUP BY in MySQL?](http://stackoverflow.com/questions/581521/whats-faster-select-distinct-or-group-by-in-mysql) – Rahul Tripathi Oct 20 '15 at 05:05
The field event_id is a primary key, auto_increment, so it is an index. Can I group by just event_id, or should the group by include all the columns? – stepanian Oct 20 '15 at 05:07
@stepanian:- In that case, I would recommend you to go with DISTINCT as it would be faster than GROUP BY. But if you are using GROUP BY then yes you have to list all the column names. – Rahul Tripathi Oct 20 '15 at 05:08
So "select distinct event.*" as in my original suggestion doesn't add a very big overhead? – stepanian Oct 20 '15 at 05:10
@stepanian:- Yes that would be the best way to proceed with in my opinion. – Rahul Tripathi Oct 20 '15 at 05:11
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/92801/discussion-between-stepanian-and-rahul-tripathi). – stepanian Oct 20 '15 at 05:14

MySQL efficient join with "in" without duplicates

1 Answers1