0

I'm trying to

  • get info from table final_stock_ul (4M records) only when there's an entry in another table final_stock_etablissements (10M records) matching on the "siren" code
  • limit the result to 1000
  • insert everything in a temporary table

Following code is very slow (15 sec) :

DROP TABLE IF EXISTS temp_results ; 

CREATE TEMPORARY TABLE IF NOT EXISTS temp_results (
    siren INT,
    denomination VARCHAR(255)

)  ENGINE=MYISAM DEFAULT CHARSET=utf8  ;


INSERT INTO temp_results (
    siren,
    denomination
)
SELECT 
    ul.siren,
    ul.denomination

FROM dw.final_stock_ul ul

WHERE 

     exists 
            (
                SELECT 1
                FROM dw.final_stock_etablissements s
                WHERE code_postal =  69001
                AND s.siren = ul.siren
            ) 

LIMIT 1000

However the "SELECT" part is very fast by itself (0.078 sec) :

SELECT 
    ul.siren,
    ul.denomination

FROM dw.final_stock_ul ul

WHERE 

     exists 
            (
                SELECT 1
                FROM dw.final_stock_etablissements s
                WHERE code_postal =  69001
                AND s.siren = ul.siren
            ) 

LIMIT 1000

How can the INSERT be so much slower than the SELECT ?

(with only 1000 records to insert)

EDIT: added missing field in INSERT statement

Bruno
  • 63
  • 6
  • 1
    The `INSERT` mentions one column; the `SELECT` mentions 2 columns. Please fix. – Rick James Jul 16 '19 at 19:50
  • Can there be multiple matching sirens in final_stock_etablissements? (I am looking into using `JOIN` instead of `EXISTS`.) – Rick James Jul 16 '19 at 19:53
  • Do you have `INDEX(postal_code, siren)`? – Rick James Jul 16 '19 at 19:54
  • @RickJames Yes they are a lot of possible matches ("etablissement" is a location and "siren" is a company number, so they can be multiple locations for one company) - And YES i added a composite index on postal_code and siren, without any result. Still cannot figure out why the insert would be so slow – Bruno Jul 17 '19 at 13:10

2 Answers2

0

I couldn't find an answer to the slow performance :/

However the underlying problem is very common and documented on the web :

List items only if matching another table

It appears that the "IF EXISTS" solution or the "WHERE IN" solutions are slow on large tables

The best solution is to use a JOIN (much faster).

However its doesnt solve completely the issue if you have multiple matches in the second table as you'll get several time the same rows as a result (LEFT JOIN only first row)

The solution I've found is :

  • Create a specific table to join on, with only one possible match for the common index and the where condition - In my case I built a table final_stock_etablissements_derived with two indexed columns : siren and code_postal (1 row per siren x code postal)
  • Make an inner join on the index (siren in my case) and apply the where condition at the end
  • Both indexes are used and the overall is very fast
Bruno
  • 63
  • 6
0

If you're counting on the composite index for performance (which is absent in your CREATE TABLE statement), try using InnoDB as the engine for the temporary table, it might help (see Rick James' answer here: MyISAM vs InnoDB for quick inserts and a composite unique key)...

Jan
  • 26
  • 4