1

My SQL Query with all the filters applied is returning 10 lakhs (one million) records . To get all the records it is taking 76.28 seconds .. which is not acceptable . How can I optimize my SQL Query which should take less time. The Query I am using is :

    SELECT cDistName , cTlkName, cGpName, cVlgName , 
           cMmbName , dSrvyOn 
      FROM sspk.villages 
 LEFT JOIN gps  ON nVlgGpID = nGpID
 LEFT JOIN TALUKS ON nGpTlkID = nTlkID   
 left JOIN dists ON nTlkDistID = nDistID
 LEFT JOIN HHINFO ON nHLstGpID = nGpID
 LEFT JOIN MEMBERS ON nHLstID = nMmbHhiID
 LEFT JOIN BNFTSTTS  ON nMmbID = nBStsMmbID
 LEFT JOIN STATUS ON nBStsSttsID = nSttsID
 LEFT JOIN  SCHEMES ON  nBStsSchID = nSchID
     WHERE (
               (nMmbGndrID = 1 and nMmbAge between 18 and 60) 
           or  (nMmbGndrID = 2 and nMmbAge between 18 and 55)
           )
      AND cSttsDesc like 'No, Eligible' 
      AND DATE_FORMAT(dSrvyOn , '%m-%Y') < DATE_FORMAT('2012-08-01' , '%m-%Y' )
 GROUP BY cDistName , cTlkName, cGpName, cVlgName , 
        DATE_FORMAT(dSrvyOn , '%m-%Y')

I have searched on the forum and outside and used some of the tips given but it hardly makes any difference . The joins that i have used in above query is left join all on Primary Key and Foreign key . Can any one suggest me how can I modify this sql to get less execution time ....

O. Jones
  • 103,626
  • 17
  • 118
  • 172
Abhishek
  • 55
  • 2
  • 2
  • 8
  • 3
    MySQLs `explain` feature is your helper in this... – arkascha Nov 21 '12 at 13:26
  • 1
    And keep in miind that one of the problems of relational databases is that joins simply scale very badly. – arkascha Nov 21 '12 at 13:27
  • table structure?, probably you would like to change the like : http://stackoverflow.com/questions/792875/which-sql-query-is-better-match-against-or-like – jcho360 Nov 21 '12 at 13:28
  • Depending on how you use the data, you might be able to cache the result using memcached or some other object caching mechanism. – Patrick James McDougle Nov 21 '12 at 13:41
  • Please post the output of `explain` and `explain extended`. – Patricia Nov 21 '12 at 14:12
  • if nMmbGndrID and nMmbAge are say in members, then consider creating a column like bnfQual and set it to true if 1 and betw 18 and 60 or 2 and betw 18 and 55. (need to modify it as they age). then you can use that column and ditch the OR. but you are clearly updating their age anyway so stick it in that routine. and i dont understand the like it is more like an =. also get it in a stored proc. – Drew Nov 21 '12 at 14:21

7 Answers7

2

You are, sir, a very demanding user of MySQL! A million records retrieved from a massively joined result set at the speed you mentioned is 76 microseconds per record. Many would consider this to be acceptable performance. Keep in mind that your client software may be a limiting factor with a result set of that size: it has to consume the enormous result set and do something with it.

That being said, I see a couple of problems.

First, rewrite your query so every column name is qualified by a table name. You'll do this for yourself and the next person who maintains it. You can see at a glance what your WHERE criteria need to do.

Second, consider this search criterion. It requires TWO searches, because of the OR.

 WHERE (
           (MEMBERS.nMmbGndrID = 1 and MEMBERS.nMmbAge between 18 and 60) 
       or  (MEMBERS.nMmbGndrID = 2 and MEMBERS.nMmbAge between 18 and 55)
       )

I'm guessing that these criteria match most of your population -- females 18-60 and males 18-55 (a guess). Can you put the MEMBERS table first in your list of LEFT JOINs? Or can you put a derived column (MEMBERS.working_age = 1 or some such) in your table?

Also try a compound index on (nMmbGndrID,nMmbAge) on MEMBERS to speed this up. It may or may not work.

Third, consider this criterion.

  AND DATE_FORMAT(dSrvyOn , '%m-%Y') < DATE_FORMAT('2012-08-01' , '%m-%Y' )

You've applied a function to the dSrvyOn column. This defeats the use of an index for that search. Instead, try this.

  AND dSrvyOn >= '2102-08-01'
  AND dSrvyOn <  '2012-08-01' + INTERVAL 1 MONTH

This will, if you have an index on dSrvyOn, do a range search on that index. My remark also applies to the function in your ORDER BY clause.

Finally, as somebody else mentioned, don't use LIKE to search where = will do. And NEVER use column LIKE '%something%' if you want acceptable performance.

O. Jones
  • 103,626
  • 17
  • 118
  • 172
  • To be fair, the OP doesn't say that the speed is unacceptable to him/her, it could be unacceptable to his/her boss - whoever it is that has an interest in over a million people's benefit status... – Paul Gregory Nov 21 '12 at 13:54
  • takes 76 seconds to fetch it, not shabby, but probably 6 hrs for the recipient to consume it. – Drew Nov 21 '12 at 13:59
  • There's nothing at all wrong with being a demanding user. But it's helpful for a developer, and a boss, to know they are being demanding. – O. Jones Nov 21 '12 at 13:59
1

You claim yourself you base your joins on good and unique indexes. So there is little to be optimized. Maybe a few hints:

  • try to optimize your table layout, maybe you can reduce the number of joins required. That probably brings more performance optimization than anything else.

  • check your hardware (available memory and things) and the server configuration.

  • use mysqls explain feature to find bottle necks.

  • maybe you can create an auxilliary table especially for this query, which is filled by a background process. That way the query itself runs faster, since the work is done before the query in background. That usually works if the query retrieves data that must not neccessarily be synchronous with every single change in the database.

  • check if an RDBMS is really the right type of database. For many purposes graph databases are much more efficient and offer better performance.

arkascha
  • 41,620
  • 7
  • 58
  • 90
  • Thanks for the reply . The explain feature is not working with this query may be due to complexity . Trying to optimize my query by reducing the number of join ... – Abhishek Nov 22 '12 at 08:40
0

Try adding an index to nMmbGndrID, nMmbAge, and cSttsDesc and see if that helps your queries out.

Additionally you can use the "Explain" command before your select statement to give you some hints on what you might do better. See the MySQL Reference for more details on explain.

Joe Meyer
  • 4,315
  • 20
  • 28
0

This SQL has many things that are redundant that may not show up in an explain.

If you require a field, it shouldn't be in a table that's in a LEFT JOIN - left join is for when data might be in the joined table, not when it has to be.

If all the required fields are in the same table, it should be the in your first FROM.

If your text search is predictable (not from user input) and relates to a single known ID, use the ID not the text search (props to Patricia for spotting the LIKE bottleneck).

Your query is hard to read because of the lack of table hinting, but there does seem to be a pattern to your field names.

You require nMmbGndrID and nMmbAge to have a value, but these are probably in MEMBERS, which is 5 left joins down. That's a redundancy.

Remember that you can do a simple join like this:

FROM sspk.villages, gps, TALUKS, dists, HHINFO, MEMBERS [...] WHERE [...] nVlgGpID = nGpID AND nGpTlkID = nTlkID AND nTlkDistID = nDistID AND nHLstGpID = nGpID AND nHLstID = nMmbHhiID

It looks like cSttsDesc comes from STATUS. But if the text 'No, Eligible' matches exactly one nBStsSttsID in BNFTSTTS then find out the value and use that! If it is 7, take out LEFT JOIN STATUS ON nBStsSttsID = nSttsID and replace AND cSttsDesc like 'No, Eligible' with AND nBStsSttsID = '7'. This would see a massive speed improvement.

Paul Gregory
  • 1,733
  • 19
  • 25
  • First of All Thanks for the reply . – Abhishek Nov 22 '12 at 08:28
  • First of All Thanks for the reply . The naming of table is correct as you have identify . The way you have given the solution is really executing fast but it is missing some records according to my test data . Also , the text 'No, Eligible' is a status for a benefits and it is not matching exactly one nBStsSttsID in BNFTSTTS so i had to use the more redundant method 'LIKE' To get teh result . However If possible i will try to normalize my table structure to reduce teh execution time further . Any way thanks for the solution and once solved i will post my original query here . Thank you – Abhishek Nov 22 '12 at 08:38
  • Well then Patricia's idea of adding a boolean field to BNFTSTTS and setting it to true for each of the nBStsSttsIDs you need in this query will help you, as you can just `nBStsSttsIsLikeNoEligible = true`. You could even populate this using a query. – Paul Gregory Nov 22 '12 at 12:48
0

If the tables used in joins are least use for updates queries, then you can probably change the engine type from INNODB to MyISAM.

Select queries in MyISAM runs 2x faster then in INNODB, but the updates and insert queries are much slower in MyISAM.

Jignesh Parmar
  • 68
  • 1
  • 1
  • 9
0

You can create Views in order to avoid long queries and time.

vishal shah
  • 212
  • 1
  • 3
  • 15
0

Your like operator could be holding you up -- full-text search with like is not MySQL's strong point.

Consider setting a fulltext index on cSttsDesc (make sure it is a TEXT field first).

ALTER TABLE articles ADD FULLTEXT(cSttsDesc);

SELECT
    *
FROM
    table_name
 WHERE MATCH(cSttsDesc) AGAINST('No, Eligible')

Alternatively, you can set a boolean flag instead of cSttsDesc like 'No, Eligible'.

Source: http://devzone.zend.com/26/using-mysql-full-text-searching/

Patricia
  • 916
  • 6
  • 12
  • 1
    Good spot. For a quick fix, moving the `like` line to after the date filter would at least mean fewer records are subjected to the full-text search. – Paul Gregory Nov 21 '12 at 13:48
  • Unfortunately (or fortunately) the query optimizer will have already determined the best order to run the filters, so moving `like` around should have no effect on runtime. Seeing the `explain` output would certainly help. – Patricia Nov 21 '12 at 13:58
  • I stand corrected. But doesn't giving the query optimizer less to do save a millisecond or two? In any event, on further reflection, I now strongly suspect that the 'No, Eligible' text is from STATUS and already has an int ID in BNFTSTTS. – Paul Gregory Nov 21 '12 at 14:00