0

I have these tables in MySQL.

CREATE TABLE `tableA` (
  `id_a` int(11) NOT NULL,
  `itemCode` varchar(50) NOT NULL,
  `qtyOrdered` decimal(15,4) DEFAULT NULL,
             :
  PRIMARY KEY (`id_a`),
  KEY `INDEX_A1` (`itemCode`)
) ENGINE=InnoDB

CREATE TABLE `tableB` (
  `id_b` int(11) NOT NULL AUTO_INCREMENT,
  `qtyDelivered` decimal(15,4) NOT NULL,
  `id_a` int(11) DEFAULT NULL,
  `opType` int(11) NOT NULL,  -- '0' delivered to customer, '1' returned from customer
                   :
  PRIMARY KEY (`id_b`),
  KEY `INDEX_B1` (`id_a`)
  KEY `INDEX_B2` (`opType`)
) ENGINE=InnoDB

tableA shows how many quantity we received order from customer, tableB shows how many quantity we delivered to customer for each order.

I want to make a SQL which counts how many quantity remaining for delivery on each itemCode. The SQL is as below. This SQL works, but slow.

SELECT T1.itemCode,
       SUM(IFNULL(T1.qtyOrdered,'0')-IFNULL(T2.qtyDelivered,'0')+IFNULL(T3.qtyReturned,'0')) as qty
FROM tableA AS T1
LEFT JOIN (SELECT id_a,SUM(qtyDelivered) as qtyDelivered FROM tableB WHERE opType = '0' GROUP BY id_a) 
     AS T2 on T1.id_a = T2.id_a
LEFT JOIN (SELECT id_a,SUM(qtyDelivered) as qtyReturned  FROM tableB WHERE opType = '1' GROUP BY id_a) 
     AS T3 on T1.id_a = T3.id_a
WHERE T1.itemCode = '?'
GROUP BY T1.itemCode

I tried explain on this SQL, and the result is as below.

+----+-------------+------------+------+----------------+----------+---------+-------+-------+----------------------------------------------+
| id | select_type | table      | type | possible_keys  | key      | key_len | ref   | rows  | Extra                                        |
+----+-------------+------------+------+----------------+----------+---------+-------+-------+----------------------------------------------+
|  1 | PRIMARY     | T1         | ref  | INDEX_A1       | INDEX_A1 | 152     | const |     1 | Using where                                  |
|  1 | PRIMARY     | <derived2> | ALL  | NULL           | NULL     | NULL    | NULL  | 21211 |                                              |
|  1 | PRIMARY     | <derived3> | ALL  | NULL           | NULL     | NULL    | NULL  |    10 |                                              |
|  3 | DERIVED     | tableB     | ref  | INDEX_B2       | INDEX_B2 | 4       |       |    96 | Using where; Using temporary; Using filesort |
|  2 | DERIVED     | tableB     | ref  | INDEX_B2       | INDEX_B2 | 4       |       | 55614 | Using where; Using temporary; Using filesort |
+----+-------------+-------------------+----------------+----------+---------+-------+-------+----------------------------------------------+

I want to improve my query. How can I do that?

N.F.
  • 3,844
  • 3
  • 22
  • 53

1 Answers1

1

First, your table B has int for opType, but you are comparing to string via '0' and '1'. Leave as numeric 0 and 1. To optimize your pre-aggregates, you should not have individual column indexes, but a composite, and in this case a covering index. INDEX table B ON (OpType, ID_A, QtyDelivered) as a single index. The OpType to optimize the WHERE, ID_A to optimize the group by, and QtyDelivered for the aggregate in the index without going to the raw data pages.

Since you are looking for the two types, you can roll them up into a single subquery testing for either in a single pass result. THEN, Join to your tableA results.

SELECT 
      T1.itemCode,
      SUM( IFNULL(T1.qtyOrdered, 0 ) 
         - IFNULL(T2.qtyDelivered, 0)
         + IFNULL(T2.qtyReturned, 0)) as qty
   FROM 
      tableA AS T1
         LEFT JOIN ( SELECT 
                           id_a,
                           SUM( IF( opType=0,qtyDelivered, 0)) as qtyDelivered,
                           SUM( IF( opType=1,qtyDelivered, 0)) as qtyReturned
                        FROM 
                           tableB 
                        WHERE 
                           opType IN ( 0, 1 )
                        GROUP BY 
                           id_a) AS T2 
            on T1.id_a = T2.id_a
   WHERE 
      T1.itemCode = '?'
   GROUP BY 
      T1.itemCode

Now, depending on the size of your tables, you might be better doing a JOIN on your inner table to table A so you only get those of the item code you are expectin. If you have 50k items and you are only looking for items that qualify = 120 items, then your inner query is STILL qualifying based on the 50k. In that case would be overkill. In this case, I would suggest an index on table A by ( ItemCode, ID_A ) and adjust the inner query to

         LEFT JOIN ( SELECT 
                           b.id_a,
                           SUM( IF( b.opType = 0, b.qtyDelivered, 0)) as qtyDelivered,
                           SUM( IF( b.opType = 1, b.qtyDelivered, 0)) as qtyReturned
                        FROM 
                           ( select distinct id_a
                                from tableA
                                where itemCode = '?' ) pqA
                              JOIN tableB b
                                 on PQA.id_A = b.id_a
                                AND b.opType IN ( 0, 1 )
                        GROUP BY 
                           id_a) AS T2 

My Query against your SQLFiddle

DRapp
  • 47,638
  • 12
  • 72
  • 142
  • I don't think there's any point in `WHERE opType IN (0, 1)` since those are the only two possible values. – Barmar May 24 '16 at 01:45
  • Thanks for reply. But using sub query makes my query slow, I think. I want to eliminate sub query. Can I do the same thing without sub query? – N.F. May 24 '16 at 01:55
  • @N.F., Which subquery. Even your initial sample shows TWO subqueries, of which I reduced to a single and accounting for both parts in one. The second sample, again, was based on amount of your data, items and querying everything regardless of the item and joining back to that item. You need AT LEAST 1 subquery to remove the POSSIBLE duplications of delivered / returned. – DRapp May 24 '16 at 02:06
  • Yes, I understand my query has 2 sub queries and you reduced to one. But I want to eliminate all the sub query. I need at least one sub query according to your comment, but using sub query is the reason my query is slow, I think. Do I have the way for another improvement? – N.F. May 24 '16 at 02:22
  • @N.F., Did you see my comment about having a proper INDEX (composite/covering) index? Individual indexes on individual pieces within the table are NOT going to be optimized. You need them in a SINGLE index. – DRapp May 24 '16 at 02:24
  • Thank you for your response. I understand about "covering index" for the first time. I added index (OpType, ID_A, QtyDelivered) on `tableB` and seems my query faster. – N.F. May 24 '16 at 04:19