0

Couldn't properly describe problem in title, sorry. Let's say, i've got two tables A(id, other columns) and B(id, a_id, other columns), where a_id - foreign key of A table. A has ~10 000 records and B around million. I need to select random record from A, which has no relations in table B. It can be done like this:

SELECT * FROM (SELECT A.id, B.id as b_id FROM A LEFT JOIN B ON B.a_id = A.id ORDER BY RAND()) tmp WHERE b_id IS NULL LIMIT 1;

However, subquery executes completly before applying where and limit, so resulting query executes for unacceptable time. I'm wondering, is there a more intellegent way of doing the same without using subquery.

P.S. This is partially solved abusing the fact that ~90% of subquery actually has no matches in B, so i'm first running this query:

SELECT * FROM (SELECT A.id, B.id as b_id FROM A LEFT JOIN B ON B.a_id = A.id ORDER BY RAND() LIMIT 10) tmp WHERE b_id IS NULL LIMIT 1;

And only if it has no hits i'm running the first one. It works, but looks really bad.

goose3228
  • 361
  • 1
  • 8
  • Did you tried compare the performance between LEFT join EXISTS with LIMIT? –  May 31 '18 at 16:59

1 Answers1

1

In big data set you can use exists:

SELECT *
FROM A
WHERE NOT EXISTS (
SELECT *
FROM B
WHERE A.id = B.a_id
)
A. Colonna
  • 852
  • 7
  • 10
  • Thanks. This one is only good when you need all records or first. Hovewer, replacing SELECT * FROM A with SELECT * FROM (SELECT * FROM A ORDER BY RAND()) AS C did the trick. – goose3228 Jun 01 '18 at 05:18