I have a mysql database with a table, obs, with about 200 million rows. There is a column, concept_id, which is a code of what the observation was about. For example, the concept_id for weight is 6012. I need to make a report on several indicators using multiple concepts. In total, there's about 20 concepts of interest. When I query the table, I do something like this:
select * from obs where concept_id in (1285,1596,2051,1061,8307,8282,5356,1224,1251,1252,5089,5090,1570,1109,1261,6174,1110,1265,1111,1268,1270,5272,1836,1279,1855,5596,1146,1088,1255,1250,1252,1499,2158)
I load this resultset into a denormalized temporary table making each concept a column and then do some analytics.
The problem is it takes a very long time to do the initial select statement - tens of minutes. Given the size of this table (not so big), from my reading, it seems this should be done much faster (seconds) if the table is properly optimized.
The obs table is indexed by concept id. Here is the result of an explain:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE obs range obs_concept,obs_complex_cvCpvoD_idx,obs_complex_cpvvD_idx obs_complex_cvCpvoD_idx 4 NULL 19584 Using where; Using index
In my my.cnf settings, I have the following:
key_buffer = 2GB
max_allowed_packet = 512M
thread_stack = 512K
thread_cache_size = 16
query_cache_limit = 1M
query_cache_size = 16M
default-storage-engine = innoDB
innodb_file_per_table = 1
innodb_file_format = barracuda
innodb_strict_mode = 1
innodb_file_per_table
innodb_log_buffer_size = 32MB
innodb_buffer_pool_size = 16GB
innodb_additional_mem_pool_size = 10M
Any suggestions for optimizing this setup/query would be very much appreciated.
UPDATE: The total number of rows returned is around 30 million. It takes roughly 20 minutes to run with the in. If I do each query individually, it takes about 10 minutes.