-1

I have a requests and a results table. Each with an email_sha256 column.

Requests may contain multiple rows with the same email, while emails are unique in the results. Emails in the results table might not exist in the requests table.

I want to get 100 results that have an email that exists in the requests table:

SELECT results.* FROM results
INNER JOIN requests ON results.email_sha256 = requests.email_sha256
LIMIT 100
  1. This generally works but it may return the same result multiple times if there are multiple requests with the same email. Is there some way to make sure that I get 100 unique results instead of duplicates?

  2. The join seems very slow. Is there some better way to get the desired result. e.g. using EXISTS?

Chris
  • 13,100
  • 23
  • 79
  • 162

2 Answers2

0

1)

This generally works but it may return the same result multiple times if there are multiple requests with the same email. Is there some way to make sure that I get 100 unique results instead of duplicates?

Use GROUP BY Docs.

SELECT results.* FROM results
INNER JOIN requests ON results.email_sha256 = requests.email_sha256
GROUP BY results.email_sha256 
LIMIT 100

2)

The join seems very slow. Is there some better way to get the desired result. e.g. using EXISTS?

We can't specifically answer this without an explanation and/or information about the table(s). However the most probably answer is that you have not indexed the correct columns.

You should have an index on your JOINing column and your GROUP BY column(s). In this case that's the same - results.email_sha256 and requests.email_sha256.

That's a good start there's also plenty of more specific Q&A on Stack Overflow on various issues of slow result return by MySQL....

Martin
  • 22,212
  • 11
  • 70
  • 132
  • Sry. Bad english. I think it should be 'they might not exist' – Chris Mar 13 '19 at 21:43
  • Say that there are 500 results and 2000 requests. In the requests there are only 300 distinct emails. So the OP wants 100 results that exist in requests. Where is the contradiction? – forpas Mar 13 '19 at 21:54
  • @forpas it was simple terminology. The qualifier `Emails in the results table may not exist in the requests table.` does not apply because it's an `INNER JOIN`. – Martin Mar 13 '19 at 22:34
0

With EXISTS:

SELECT r.* FROM results r
WHERE EXISTS (
 SELECT 1 FROM requests WHERE email_sha256 = r.email_sha256
)
LIMIT 100

This returns 100 unique rows rows since email_sha256 is unique in results .

forpas
  • 160,666
  • 10
  • 38
  • 76