1

I'm writing a query to assign users and their respective domains to IP addresses. No IP address may have duplicate users.

Here's what I've got so far in an SQL Fiddle: http://sqlfiddle.com/#!2/39c51/2/0

I have a table which contains all the (hundreds of thousands of) current assignments. An example on a smaller scale would look like the following:

mysql> select * from test.usermap;
+-------------+-------+-------------------+
| vip         | user  | domain            |
+-------------+-------+-------------------+
| 100.50.20.1 | joe   | joesdomain.com    |
| 100.50.20.1 | bob   | joesdomain.com    |
| 100.50.20.2 | tom   | domain2.com       |
| 100.50.20.2 | fred  | domain2.com       |
| 100.50.20.2 | sally | domain2.com       |
| 100.50.20.3 | admin | athriddomain.com  |
| 100.50.20.4 | admin | numberfour.com    |
| 100.50.20.3 | sally | fivewithsally.com |
| 100.50.20.4 | jim   | thesix.com        |
| 100.50.20.1 | admin | seven.com         |
| 100.50.20.1 | sally | seven.com         |
| 100.50.20.1 | sue   | seven.com         |
| 100.50.20.5 |       |                   |
| 100.50.20.6 |       |                   |
+-------------+-------+-------------------+
14 rows in set (0.00 sec)

I have another table, which contains the yet-to-be assigned users, again, a small scale example:

mysql> select * from test.newusers;
+-------+-----------+
| user  | domain    |
+-------+-----------+
| jim   | eight.com |
| sally | eight.com |
| admin | nine.com  |
| james | ten.com   |
| jane  | ten.com   |
+-------+-----------+
5 rows in set (0.00 sec)

The idea here is to assign all users under eight.com to .5 because thats the earliest IP which has neither a 'jim' nor a 'sally', and then nine.com to .2 and ten.com to .1 because of their respective user conflicts (or lack thereof).

The result I'm looking for would look like:

+-------------+-------+-----------+
| vip         | user  | domain    |
+-------------+-------+-----------+
| 100.50.20.1 | james | ten.com   |
| 100.50.20.1 | jane  | ten.com   |
| 100.50.20.2 | admin | nine.com  |
| 100.50.20.5 | jim   | eight.com |
| 100.50.20.5 | sally | eight.com |
+-------------+-------+-----------+
5 rows in set (0.01 sec)

I can do this with a subquery inside a correlated subquery, like so:

mysql> select  
(
    select vip 
    from test.usermap
    where vip not in
    (
        select distinct vip 
        from test.usermap  
        where user in
        (
            select user 
            from test.newusers 
            where domain = n.domain
        )
    )
    order by inet_aton(vip) asc
    limit 1
) as vip, n.user, n.domain 
from test.newusers n
order by inet_aton(vip) asc;
+-------------+-------+-----------+
| vip         | user  | domain    |
+-------------+-------+-----------+
| 100.50.20.1 | james | ten.com   |
| 100.50.20.1 | jane  | ten.com   |
| 100.50.20.2 | admin | nine.com  |
| 100.50.20.5 | jim   | eight.com |
| 100.50.20.5 | sally | eight.com |
+-------------+-------+-----------+
5 rows in set (0.00 sec)

But that is horrendously inefficient, and my production mapping and newusers tables are 300k and 50k rows respectively, so this is out of the question.

I'm trying to make this more efficient by using joins instead of nested subqueries, so I replaced the inner query with a join and listed the outer query's columns in the ON clause, but it seems this isn't possible:

mysql> select 
(
    select distinct vip 
    from test.usermap u 
    join test.newusers r
        on r.domain = n.domain
        and r.user != u.user
    order by inet_aton(vip) asc limit 1
) as vip, n.user, n.domain
from test.newusers n;
ERROR 1054 (42S22): Unknown column 'n.domain' in 'on clause'
mysql> 

While the logic of the query itself makes sense, because replacing the outer query reference with a string constant it would represent works fine:

mysql> select
(
    select distinct vip 
    from test.usermap u 
    join test.newusers r
        on r.domain = 'ten.com'
        and r.user != u.user
    order by inet_aton(vip) asc limit 1
) as vip, n.user, n.domain
from test.newusers n
where domain = 'ten.com';
+-------------+-------+---------+
| vip         | user  | domain  |
+-------------+-------+---------+
| 100.50.20.1 | james | ten.com |
| 100.50.20.1 | jane  | ten.com |
+-------------+-------+---------+
2 rows in set (0.00 sec)

My question is: Is there a way to reference a column from an outer query inside of a join on an inner query? If not, what kind of (if any) alternatives exist without nesting subqueries in an inefficient manner?

Again, I have a fiddle here: http://sqlfiddle.com/#!2/39c51/2/0

jesse_galley
  • 1,676
  • 4
  • 18
  • 30

1 Answers1

3

I am not sure how much, if any, more efficient this will be, but the query can be rewritten without nesting multiple subqueries:

SELECT  INET_NTOA(MIN(INET_ATON(UserMap.VIP))) AS VIP,
        NewUsers.User, 
        NewUsers.Domain
FROM    NewUsers
        CROSS JOIN UserMap
        LEFT JOIN
        (   SELECT  u.Domain, m.VIP
            FROM    NewUsers u
                    INNER JOIN UserMap m
                        ON u.User = m.User
        ) ex
            ON ex.Domain = NewUsers.Domain
            AND ex.VIP = UserMap.VIP
WHERE   ex.Domain IS NULL
GROUP BY NewUsers.User, NewUsers.Domain
ORDER BY VIP ASC;   

Example on your SQL Fiddle

ADDENDUM

The query above will not return rows where there is no VIP available, e.g. if 100.50.20.5 and 100.50.20.1 are removed from UserMap your query would return:

VIP             USER    DOMAIN
NULL            jim     eight.com
NULL            sally   eight.com
100.50.20.1     james   ten.com
100.50.20.1     jane    ten.com
100.50.20.2     admin   nine.com

Whereas the query I have written will only return the rows where VIP is not null:

VIP             USER    DOMAIN
100.50.20.1     james   ten.com
100.50.20.1     jane    ten.com
100.50.20.2     admin   nine.com

To get around this you can use a UNION:

SELECT  INET_NTOA(MIN(INET_ATON(a.VIP))) AS VIP,
        a.User, 
        a.Domain
FROM    (   SELECT  UserMap.VIP,
                    NewUsers.User, 
                    NewUsers.Domain
            FROM    NewUsers
                    CROSS JOIN UserMap
                    LEFT JOIN
                    (   SELECT  u.Domain, m.VIP
                        FROM    NewUsers u
                                INNER JOIN UserMap m
                                    ON u.User = m.User
                    ) ex
                        ON ex.Domain = NewUsers.Domain
                        AND ex.VIP = UserMap.VIP
            WHERE   ex.Domain IS NULL
            UNION ALL
            SELECT  NULL AS VIP,
                    NewUsers.User,
                    NewUsers.Domain
            FROM    NewUsers
        ) a
GROUP BY a.User, a.Domain
ORDER BY VIP ASC;

Revised Example on SQL Fiddle

I am not sure what your logic is for handling cases where there is no VIP available so can't really suggest a solution to this part. But you can get the next VIP using this:

SELECT  INET_NTOA(MAX(INET_ATON(UserMap.VIP)) + 1) AS NextVIP
FROM    UserMap

Another issue with your problem is collisions in NewUsers, e.g. If your NewUsers table contained these records:

('jim','eight.com'),
('sally','eight.com'),
('jim','eleven.com'),
('sally','eleven.com');

Both your query and mine would allocate all of these to VIP 100.50.20.5. If this is likely to happen I think the best way to get around this would be to only insert user names from one domain at any one time. But it can be done using just JOINs:

To simplify the query I have created 2 views

CREATE VIEW UsedVIP
AS
    SELECT  u.Domain, m.VIP
    FROM    NewUsers u
            INNER JOIN UserMap m
                ON u.User = m.User;

CREATE VIEW NewUserMap 
AS
    SELECT  UserMap.VIP,
            NewUsers.User, 
            NewUsers.Domain
    FROM    NewUsers
            CROSS JOIN UserMap
            LEFT JOIN UsedVIP ex
                ON ex.Domain = NewUsers.Domain
                AND ex.VIP = UserMap.VIP
    WHERE   ex.Domain IS NULL;

And the final query is:

SELECT  INET_NTOA(MIN(INET_ATON(a.VIP))) AS VIP,
        a.User, 
        a.Domain
FROM    NewUserMap a
        LEFT JOIN NewUserMap b
            ON a.User = b.user
            AND a.VIP = b.VIP
            AND a.Domain > b.domain
        LEFT JOIN NewUserMap c
            ON a.User = c.user
            AND b.Domain = c.domain
            AND b.VIP < c.VIP
WHERE   c.user IS NULL
GROUP BY a.User, a.Domain
ORDER BY VIP ASC;

Which returns:

VIP             USER    DOMAIN
100.50.20.1     jane    ten.com
100.50.20.1     james   ten.com
100.50.20.2     admin   nine.com
100.50.20.5     sally   eight.com
100.50.20.5     jim     eight.com
100.50.20.6     jim     eleven.com
100.50.20.6     sally   eleven.com

Example on SQL Fiddle

GarethD
  • 68,045
  • 10
  • 83
  • 123
  • Well, in my definition, that **is** a correlated subquery. (it correlates to the main query via the `WHERE u.Domain = NewUsers.Domain AND m.VIP = UserMap.VIP` clause) BTW: I liked the EXISTS version better. ( `LEFT JOIN ... WHERE NULL` is only confusing, IMHO) – wildplasser May 09 '13 at 22:37
  • @wildplasser I see your point, I had already edited the answer though to use `LEFT JOIN/IS NULL` rather than `NOT EXISTS` since MySQL optimises this better. This is definitely not a correlated subquery. – GarethD May 09 '13 at 22:45
  • It was a correlated subquery before you edited it into the ugly `LEFT JOIN WHERE NULL` version. **Semantically** they are identical though, so performance should not be an issue. – wildplasser May 09 '13 at 22:48
  • They are not identical in terms of execution plans, as has been [documented here](http://explainextended.com/2009/09/18/not-in-vs-not-exists-vs-left-join-is-null-mysql/), and summarised in [this answer](http://stackoverflow.com/questions/2246772/whats-the-difference-between-not-exists-vs-not-in-vs-left-join-where-is-null). In addition comparison of the two methods in [this fiddle](http://sqlfiddle.com/#!2/39c51/29) shows two different plans. – GarethD May 09 '13 at 22:51
  • http://en.wikipedia.org/wiki/Law_of_the_instrument IMHO you are playing twenty questions with the compiler/generator. – wildplasser May 09 '13 at 22:54
  • @wildplasser I don't understand your point? Is your concern over my use of `LEFT JOIN\IS NULL`, the fact that I referred to `NOT EXISTS` as not being a correlated subquery, or do you not think this is a better solution than the OP's current solution or your solution? Posting links to articles with little explanation, and cryptic statements regarding the compiler is not constructive feedback in my opinion. – GarethD May 09 '13 at 23:03
  • The problem is IMHO that you are confusing efficency with effectivity. The OQ was a *logic problem*, and not a efficiency issue. BTW: the question cannot be solved in the general case, because there is no guarantee that `COUNT( IP_addresses) >= COUNT(usernames)` so it winds down to a data modelling problem. – wildplasser May 09 '13 at 23:10
  • 2
    To quote from the question **I'm trying to make this more efficient by using joins instead of nested subqueries** and also **If not, what kind of (if any) alternatives exist without nesting subqueries in an inefficient manner?** This is the part of the question I was answering as I do not believe there is a way to solve the first part: **Is there a way to reference a column from an outer query inside of a join on an inner query?**. So it does relate to efficiency. I agree there is a logic problem if there are not enough IP addresses to satisfy new users, but this is not part of the question. – GarethD May 09 '13 at 23:15
  • ... which implies that the OP does not know what he/she is doing. Asking for an **effective** solution for an **insolvable** problem? Makes no sense to me. – wildplasser May 09 '13 at 23:19
  • 1
    wildplasser et al: Thank you for pointing out the logic problem, I understand that ostensibly, there is indeed a logic problem in the sense that there will potentially be a case when I can't assign a user to an IP. This is a non-issue. I have more then enough IPs (several hundred class-Cs) for all the users involved in this project, this has been planned extensively ahead of time. @GarethD Your solution(s) seems to work well, at a glance, they work considerably better then anything I've come up with thus far, thank you for putting in the time to answer my question. – jesse_galley May 10 '13 at 13:16