0

I am working on MySql database. Where I need to merge information form multiple(more than 10) tables into a single one. In order to do that I am following a typical joining style.

Select * from 
table_1 
Join table_2
on(table_1.id = table_2.id)
Join table_3
on(table_1.id = table_3.id)

It works but I suffer a lot during execution time. Is there any other good way to optimize my code? Following is the sample of my code:

SELECT
distinct 
u.Id, 
oc.dt,
Daily_Number_Outgoing_Calls,     
Daily_Number_Incoming_Calls,    
Daily_duration_Outgoing_Calls

FROM
creditfix.users u

JOIN

#1 Daily_No_Out_Calls
    (
        SELECT
        cl.uId,SUBSTRING(DATE,1,10) as dt,
        count(1) as Daily_Number_Outgoing_Calls

        From creditfix.call_logs as cl
            WHERE
                cl.`type`=2 #out going calls only
        GROUP by cl.uId,dt
    ) oc
    ON (u.Id=oc.Uid)

#2 Daily_No_In_Calls
    JOIN
    (
        SELECT
        cl.uId, SUBSTRING(DATE,1,10) as dt,
        count(1) as Daily_Number_Incoming_Calls
        From creditfix.call_logs as cl
        WHERE
            cl.`type`=1 #incoming calls only
        GROUP by cl.uId,dt
    ) ic
    ON (u.Id=ic.Uid)

#3 Daily_duration_Out_Calls
     JOIN
    (
        SELECT
        cl.uId,SUBSTRING(DATE,1,10) as dt, 
        (sum(duration)) as Daily_duration_Outgoing_Calls
        From creditfix.call_logs as cl
        WHERE
            cl.`type`=2 #out going calls only
        GROUP by cl.uId,dt
    ) od
    ON (u.Id=od.uid)
    # It goes on like this...
Nomiluks
  • 2,052
  • 5
  • 31
  • 53
  • 2
    Please get out of the habit of using column numbers instead of names in `GROUP BY`, it makes queries harder to read. – Barmar May 08 '17 at 18:27
  • 4
    And the become fragile when you edit code, in case you insert columns in the `SELECT` list. – Barmar May 08 '17 at 18:27
  • 2
    The performance problem is because you're not just joining tables, you're joining subqueries. These don't have indexes like real tables do, so they're hard for MySQL to optimize. – Barmar May 08 '17 at 18:29
  • @Barmar hi, I am newbie and made edits in my code :) thanks for your comments – Nomiluks May 08 '17 at 18:36
  • @Barmar is there any way of indexing such queries ? – Nomiluks May 08 '17 at 18:36
  • Why are the subqueries only doing a single date, but then also grouping by the date? Is that just for simplification? – Barmar May 08 '17 at 18:39
  • @Barmar yes, its for simplifications. we can remove them. Let me remove it – Nomiluks May 08 '17 at 18:40

2 Answers2

5

It looks like you don't need to use separate subqueries for each column, you should be able to do them in a single subquery.

I also don't think you should need DISTINCT in the main query.

SELECT 
    u.Id, 
    cl.dt,
    cl.Daily_Number_Outgoing_Calls,     
    cl.Daily_Number_Incoming_Calls,    
    cl.Daily_duration_Outgoing_Calls,   
    cl.Daily_duration_Incoming_Calls #.... for keep on adding like this

FROM creditfix.users u
JOIN (
    SELECT uId, SUBSTRING(DATE, 1, 10) AS dt,
        SUM(`type`=2) AS Daily_Number_Outgoing_Calls,
        SUM(`type`=1) AS Daily_Number_Incoming_Calls,
        SUM(IF(`type`=2, duration, 0)) AS Daily_duration_Outgoing_Calls,
        SUM(IF(`type`=1, duration, 0)) AS Daily_duration_Incoming_Calls
    FROM creditfix.call_logs as cl
    GROUP BY uId, dt) AS cl
ON u.Id = cl.uId

See multiple query same table but in different columns mysql for the logic used in the subquery to get all the counts.

Community
  • 1
  • 1
Barmar
  • 741,623
  • 53
  • 500
  • 612
  • Definitely a good idea, but out of curiosity, why use a subquery at all? It seems like it would be more readily understandable without one. – Joe Farrell May 08 '17 at 18:45
  • 1 possible reason is if creditfix.users has a many-to-many with call_logs. by aggregating before the join you eliminate the risk of artificially increasing the counts. – xQbert May 08 '17 at 18:54
  • 2
    @xQbert That's an issue in some queries, but here we seem to be joining with a unique ID, and we're grouping the subquery by that foreign key. The more general reason is that joining after aggregating reduces the size of the join. – Barmar May 08 '17 at 18:59
3

As mentioned in the comments, these aren't simple joins, these are subquery joins which makes optimization more difficult. You'll have to optimize each subquery, or figure a way to not need subqueries.

Since you want to get call log info per user and type for a given day, this can be done with a simple join and group by. No subqueries necessary.

select
    ???
from
    creditfix.users u
join
    creditfix.call_logs as cl on u.id = cl.uid
where
    substring(date,1,10)=???
group by
    cl.uid, cl.type;

So to replicate what it appears you're going for, the number of calls and their total duration...

select
    u.id, cl.type, count(cl.id) as num_calls, sum(cl.duration) as duration
from
    creditfix.users u
join
    creditfix.call_logs as cl on u.id = cl.uid
where
    substring(date,1,10)='2017-03-18'
group by
    cl.uid, cl.type;

You'll get something like this.

+----+------+-----------+---------------+
| id | type | num_calls | call_duration |
+----+------+-----------+---------------+
|  1 |    1 |         3 |            20 |
|  1 |    3 |         1 |            10 |
|  1 |    5 |         2 |             4 |
|  2 |    5 |         1 |             4 |
+----+------+-----------+---------------+

This loses the ability to name each individual column, but that's something whatever is receiving the query can handle. Or it can be handled with a single subquery.

The types can be named with a case...

case cl.type
    when 1 then 'outgoing'
    when 2 then 'incoming'
    when 3 then ...
    else cl.type
end as type

...but this requires hard coding magic numbers in queries. you'd be better off making a table to store info about the types and joining with that.


The subqueries themselves have a potential performance problem here: substring(date,1,10) = '2017-03-08'. If date is not indexed, the queries will have to do a full table scan.

Having date as a string introduces performance problems. The database has to perform string operations on each row, though it's possible MySQL is smart enough to use the index. Whereas with a datetime type is a simple numeric comparison and will use an index. It's also a bit smaller, 8 bytes.

It allows you to use the date and time functions without converting. SUBSTRING(DATE,1,10) can be replaced with the faster and safer date(date).

Also naming a column date is a bad idea, it's a function name in MySQL and might cause problems.

Schwern
  • 153,029
  • 25
  • 195
  • 336