0

I have 3 tables as follow:

class
  id:bigint(PK)
  name:varchar


principal:
  id: bigint(PK)
  pid:bigint
  flag:boolean
  uniqueConstraint(pid, flag)

entry:
  cid: (FK, reference class)
  pid: (FK, refernce principal)
  object_id: bigint
  code: tinyint
  PK: (cid, pid, obj)

The query must check existence of a record in entry using a parameter set.

suppose parameter set is as follow:

  • class's name: Class#3
  • principal id as user: 3
  • principal id as role: 2
  • object ID: 45

I wrote 2 queries, one using join and one using sub-query:

Query number 1:

select id from entry where pid in
    (select id from principal
         where (pid=2 AND role)
            OR (pid=3 AND !role)
    )
    AND cid = (select id from class where name='Class#3')
    AND object_id=45

And query number 2:

select e.id from class c
            inner join entry e on e.cid=c.id and c.name='Class#3'
            inner join principal p on p.id=e.pid 
                     and p.id in ( select id from principal
                         where (pid=2 AND role)
                            OR (pid=3 AND !role)
                                 )
    where e.object_id=45

Of course there's an additional condition for checking code which I didn't include it in queries.

I want to know which one performs better in large scale production environment. Suppose there are 100 rows in class, 10000 in principal and over 250000 in 'entry', and query(as explained) must be executed for each request, and there are at least 3000 users working on system constantly and simultaneously at any time.

  1. Which of these queries will perform better And why? the reason is so important for further works
  2. Is there better approach to write the query than these 2 approaches or even better approach to construct the schema?

Regards


PS: I Have read this question about comparing sub-query and join, but my question is not exactly a simple comparison

Rick James
  • 135,179
  • 13
  • 127
  • 222
HMD
  • 468
  • 1
  • 5
  • 21
  • 1
    Sure looks like a simple comparision though..why not generate a bunch of testdata and time the queries? – Oli Apr 13 '20 at 10:58
  • @Oli I have generated data and tested using my laptop and _xampp_. When i ran the queries multiple time in _phpMyAdmin_, the execution time decreases every time and stops at a very very low time as millisecond. so i couldn't figure out if should i trust the result or not... – HMD Apr 13 '20 at 11:42
  • 1
    check this out https://stackoverflow.com/questions/181894/mysql-force-not-to-use-cache-for-testing-speed-of-query – Oli Apr 13 '20 at 12:36
  • @Oli thanks for the link. It seems even using `SQL_NO_CACHE` doesn't stop mysql using OS or pool cache. So i need to restart mysql server every time. And the reason i said it's not a comparison, because queries can be written combining both approaches or even other approaches, and unfortunately I don't know how to use results of `explain` keyword to optimize queries. – HMD Apr 13 '20 at 12:43
  • Hmm, well I think it's best practice to always use JOIN statements anyhow. The speed difference is negligible compared to other possible bottlenecks like incorrect indexes, table structure etc.It's well worth reading up on some docs of the EXPLAIN function if you're really interested in optimizing queries. – Oli Apr 13 '20 at 12:46
  • @Oli thanks for the tip. I've read the mysql documentation, it has so much information and i dont have time for reading all of that. It is the first time that i'm about to optimize my queries as much as possible, duo to the environment i stated in the question. anyway, i guess i need to put some time to read the documentation. thanks – HMD Apr 13 '20 at 12:49
  • Please provide `SHOW CREATE TABLE` and `EXPLAIN SELECT ...` – Rick James Apr 14 '20 at 00:16
  • Looks like incorrect use of aliases to me. `role`? – StackSlave Apr 14 '20 at 00:28

1 Answers1

1

IN ( SELECT ... ) is usually inefficient.

OR is usually inefficient.

JOINs are often more efficient than other ways of expressing something.

This

     where (pid=2 AND role)
        OR (pid=3 AND !role)

might be faster this way:

     where pid IN (2,3)
       AND ((pid=2 AND role)
         OR (pid=3 AND !role)
           )

The speedup might come if an index can be effectively used to limit the effort to piss 2 and 3 before doing that OR.

Try out what others Commented on and my suggestions, plus provide CREATE TABLE and EXPLAIN. Then I can advise on indexing.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • Well, yesterday i came up with analyzing schema structure and i found out that i can simply create another table, the same as `entry` with different name and I can remove `principal` and `class` so there's no need for joins or sub-queries anymore. Thanks for your help, the tip over checking `pid` and `role` is on point. – HMD Apr 14 '20 at 07:34