I'm writing a query to assign users and their respective domains to IP addresses. No IP address may have duplicate users.
Here's what I've got so far in an SQL Fiddle: http://sqlfiddle.com/#!2/39c51/2/0
I have a table which contains all the (hundreds of thousands of) current assignments. An example on a smaller scale would look like the following:
mysql> select * from test.usermap;
+-------------+-------+-------------------+
| vip | user | domain |
+-------------+-------+-------------------+
| 100.50.20.1 | joe | joesdomain.com |
| 100.50.20.1 | bob | joesdomain.com |
| 100.50.20.2 | tom | domain2.com |
| 100.50.20.2 | fred | domain2.com |
| 100.50.20.2 | sally | domain2.com |
| 100.50.20.3 | admin | athriddomain.com |
| 100.50.20.4 | admin | numberfour.com |
| 100.50.20.3 | sally | fivewithsally.com |
| 100.50.20.4 | jim | thesix.com |
| 100.50.20.1 | admin | seven.com |
| 100.50.20.1 | sally | seven.com |
| 100.50.20.1 | sue | seven.com |
| 100.50.20.5 | | |
| 100.50.20.6 | | |
+-------------+-------+-------------------+
14 rows in set (0.00 sec)
I have another table, which contains the yet-to-be assigned users, again, a small scale example:
mysql> select * from test.newusers;
+-------+-----------+
| user | domain |
+-------+-----------+
| jim | eight.com |
| sally | eight.com |
| admin | nine.com |
| james | ten.com |
| jane | ten.com |
+-------+-----------+
5 rows in set (0.00 sec)
The idea here is to assign all users under eight.com to .5 because thats the earliest IP which has neither a 'jim' nor a 'sally', and then nine.com to .2 and ten.com to .1 because of their respective user conflicts (or lack thereof).
The result I'm looking for would look like:
+-------------+-------+-----------+
| vip | user | domain |
+-------------+-------+-----------+
| 100.50.20.1 | james | ten.com |
| 100.50.20.1 | jane | ten.com |
| 100.50.20.2 | admin | nine.com |
| 100.50.20.5 | jim | eight.com |
| 100.50.20.5 | sally | eight.com |
+-------------+-------+-----------+
5 rows in set (0.01 sec)
I can do this with a subquery inside a correlated subquery, like so:
mysql> select
(
select vip
from test.usermap
where vip not in
(
select distinct vip
from test.usermap
where user in
(
select user
from test.newusers
where domain = n.domain
)
)
order by inet_aton(vip) asc
limit 1
) as vip, n.user, n.domain
from test.newusers n
order by inet_aton(vip) asc;
+-------------+-------+-----------+
| vip | user | domain |
+-------------+-------+-----------+
| 100.50.20.1 | james | ten.com |
| 100.50.20.1 | jane | ten.com |
| 100.50.20.2 | admin | nine.com |
| 100.50.20.5 | jim | eight.com |
| 100.50.20.5 | sally | eight.com |
+-------------+-------+-----------+
5 rows in set (0.00 sec)
But that is horrendously inefficient, and my production mapping and newusers tables are 300k and 50k rows respectively, so this is out of the question.
I'm trying to make this more efficient by using joins instead of nested subqueries, so I replaced the inner query with a join and listed the outer query's columns in the ON clause, but it seems this isn't possible:
mysql> select
(
select distinct vip
from test.usermap u
join test.newusers r
on r.domain = n.domain
and r.user != u.user
order by inet_aton(vip) asc limit 1
) as vip, n.user, n.domain
from test.newusers n;
ERROR 1054 (42S22): Unknown column 'n.domain' in 'on clause'
mysql>
While the logic of the query itself makes sense, because replacing the outer query reference with a string constant it would represent works fine:
mysql> select
(
select distinct vip
from test.usermap u
join test.newusers r
on r.domain = 'ten.com'
and r.user != u.user
order by inet_aton(vip) asc limit 1
) as vip, n.user, n.domain
from test.newusers n
where domain = 'ten.com';
+-------------+-------+---------+
| vip | user | domain |
+-------------+-------+---------+
| 100.50.20.1 | james | ten.com |
| 100.50.20.1 | jane | ten.com |
+-------------+-------+---------+
2 rows in set (0.00 sec)
My question is: Is there a way to reference a column from an outer query inside of a join on an inner query? If not, what kind of (if any) alternatives exist without nesting subqueries in an inefficient manner?
Again, I have a fiddle here: http://sqlfiddle.com/#!2/39c51/2/0