Optimizing mysql queries on 200 million row table

Question

I have a mysql database with a table, obs, with about 200 million rows. There is a column, concept_id, which is a code of what the observation was about. For example, the concept_id for weight is 6012. I need to make a report on several indicators using multiple concepts. In total, there's about 20 concepts of interest. When I query the table, I do something like this:

select * from obs where concept_id in     (1285,1596,2051,1061,8307,8282,5356,1224,1251,1252,5089,5090,1570,1109,1261,6174,1110,1265,1111,1268,1270,5272,1836,1279,1855,5596,1146,1088,1255,1250,1252,1499,2158)

I load this resultset into a denormalized temporary table making each concept a column and then do some analytics.

The problem is it takes a very long time to do the initial select statement - tens of minutes. Given the size of this table (not so big), from my reading, it seems this should be done much faster (seconds) if the table is properly optimized.

The obs table is indexed by concept id. Here is the result of an explain:

id  select_type table   type    possible_keys   key key_len ref rows    Extra
1   SIMPLE  obs range   obs_concept,obs_complex_cvCpvoD_idx,obs_complex_cpvvD_idx   obs_complex_cvCpvoD_idx 4   NULL    19584   Using where; Using index

In my my.cnf settings, I have the following:

key_buffer              = 2GB
max_allowed_packet      = 512M
thread_stack            = 512K
thread_cache_size       = 16
query_cache_limit       = 1M
query_cache_size        = 16M
default-storage-engine = innoDB
innodb_file_per_table = 1
innodb_file_format = barracuda
innodb_strict_mode = 1
innodb_file_per_table
innodb_log_buffer_size                  = 32MB
innodb_buffer_pool_size                 = 16GB
innodb_additional_mem_pool_size         = 10M

Any suggestions for optimizing this setup/query would be very much appreciated.

UPDATE: The total number of rows returned is around 30 million. It takes roughly 20 minutes to run with the in. If I do each query individually, it takes about 10 minutes.

You have a 200 million rows and you said that's "not so big"!? — Kenan Zahirovic, Dec 11 '14 at 19:12
I think this might be useful to you: [SQL fixed-value IN() vs. INNER JOIN performance](http://stackoverflow.com/a/4771191/1115360). — Andrew Morton, Dec 11 '14 at 19:20
Seeing the SHOW CREATE TABLE output for your table might help ... with an index on just 'concept_id' this shouldn't really take long unless the total number of concept isn't much larger than the list of concept_ids given in IN() so that a massive amount of rows gets returned ... — Hartmut Holzgraefe, Dec 11 '14 at 21:25

Optimizing mysql queries on 200 million row table

0 Answers0