4

I have a simple table (created by django) - engine InnoDB:

+-------------+------------------+------+-----+---------+----------------+
| Field       | Type             | Null | Key | Default | Extra          |
+-------------+------------------+------+-----+---------+----------------+
| id          | int(11)          | NO   | PRI | NULL    | auto_increment |
| correlation | double           | NO   |     | NULL    |                |
| gene1_id    | int(10) unsigned | NO   | MUL | NULL    |                |
| gene2_id    | int(10) unsigned | NO   | MUL | NULL    |                |
+-------------+------------------+------+-----+---------+----------------+

The table has more than 411 million rows. (The target table will have around 461M rows, 21471*21470 rows)

My main query looks like this, there might be up to 10 genes specified at most.

 SELECT gene1_id, AVG(correlation) AS avg FROM genescorrelation 
 WHERE gene2_id IN (176829, 176519, 176230) 
 GROUP BY gene1_id ORDER BY NULL 

This query is very slow, it takes almost 2 mins to run:

21471 rows in set (1 min 11.03 sec)

Indexes (cardinality looks strange - too small?):

  Non_unique| Key_name                                         | Seq_in_index | Column_name | Collation | Cardinality |
          0 | PRIMARY                                          |            1 | id          | A         |   411512194 | 
          1 | c_gene1_id_6b1d81605661118_fk_genes_gene_entrez  |            1 | gene1_id    | A         |          18 |
          1 | c_gene2_id_2d0044eaa6fd8c0f_fk_genes_gene_entrez |            1 | gene2_id    | A         |          18 | 

I just run select count(*) on that table and it took 22 mins:

select count(*) from predictions_genescorrelation;

+-----------+
| count(*)  |
+-----------+
| 411512002 |
+-----------+
1 row in set (22 min 45.05 sec)

What could be wrong? I suspect that mysql configuration is not set up right.

During the import of data I experienced problem with space, so that might also affected the database, although I ran check table later - it took 2hours and stated OK.

Additionally - the cardinality of the indexes look strange. I have set up smaller database locally and there values are totally different (254945589,56528,17).

Should I redo indexes? What params should I check of MySQL? My tables are set up as InnoDB, would MyISAM make any difference?

Thanks, matali

matali
  • 186
  • 1
  • 10
  • 1
    I think that this question would be better suitable on http://dba.stackexchange.com since it involves more configuration than query performance. – Jorge Campos Aug 12 '15 at 15:53
  • For a query like this I would create an index `(gene2_id, gene1_id, correlation)`. Additionally the `id` serial might be totally useless, do you ever use it in WHERE-conditions? And what's your logical Primary Key, `(gene2_id, gene1_id)`? – dnoeth Aug 12 '15 at 16:04
  • Do you need `id`? It seems like `PRIMARY KEY(gene2_id, gene1_id)` would be unique, and much faster. Also, the gene_ids could be `SMALLINT UNSIGNED` to be 2 bytes instead of the current 2 bytes. – Rick James Aug 16 '15 at 01:27
  • 1
    Stay with InnoDB. But check `innodb_buffer_pool_size`; it should be about 70% of RAM. It would be especially good if it were bigger than the table size. – Rick James Aug 16 '15 at 01:28
  • The logic for putting this on hold is bogus. It's an optimization question for which there are many answers, most of them orthogonal and additive. – Rick James Aug 16 '15 at 01:30
  • Thanks for hints. I am going to ask to increase innodb_buffer_pool_size. – matali Aug 17 '15 at 02:24

1 Answers1

5

https://www.percona.com/blog/2006/12/01/count-for-innodb-tables/

SELECT COUNT(*) queries are very slow without WHERE clause or without SELECT COUNT(id) ... USE INDEX (PRIMARY).

to speedup this:

 SELECT gene1_id, AVG(correlation) AS avg FROM genescorrelation 
 WHERE gene2_id IN (176829, 176519, 176230) 
 GROUP BY gene1_id ORDER BY NULL

you should have composite key on (gene2_id, gene1_id, correlation) in that order. try

About index-cardinality: stats of Innodb tables are approximate, not accurate (sometimes insane). there even was (IS?) a bug-report https://bugs.mysql.com/bug.php?id=58382

Try to ANALIZE table and watch cardinality again

M0rtiis
  • 3,676
  • 1
  • 15
  • 22