0

We have a data warehouse with denormalized tables ranging from 500K to 6+ million rows. I am developing a reporting solution, so we are utilizing database paging for performance reasons. Our reports have search criteria and we have created the necessary indexes, however, performance is poor when dealing with the million(s) row tables. The client is set on always knowing the total records, so I have to fetch the data as well as the record count.

Are there any other things I can do to help with performance? I'm not the MySQL dba and he has not really offered anything up, so I'm not sure what he can do configuration wise.

Thanks!

Marco
  • 2,453
  • 3
  • 25
  • 35
  • 1
    have you considered data sharding? http://en.wikipedia.org/wiki/Shard_(database_architecture) – AJ. Apr 20 '11 at 16:56
  • the following example has a table with 500+ million rows and runtimes of 0.02 seconds. http://stackoverflow.com/questions/4419499/mysql-nosql-help-me-to-choose-the-right-one-on-a/4421601#4421601 – Jon Black Apr 20 '11 at 18:14

2 Answers2

2

You should use "Partitioning"

It's main goal is to reduce the amount of data read for particular SQL operations so that overall response time is reduced.

Refer:

http://dev.mysql.com/tech-resources/articles/performance-partitioning.html

0

If you partition the large tables and store the parts on different servers, than your query will run faster.

see: http://dev.mysql.com/doc/refman/5.1/en/partitioning.html

Also note that using NDB tables you can use HASH keys that get looked up in O(1) time.

For the number of lines you can keep a running total in a separate table and update that. For example in a after insert and after delete trigger.
Although the trigger will slow down deletes/inserts this will be spread over time. Note that you don't have to keep all totals in one row, you can store totals per condition. Something like:

table    field    condition    row_count
----------------------------------------
table1   field1   cond_x       10
table1   field1   cond_y       20

select sum(row_count) as count_cond_xy 
from totals where field = field1 and `table` = table1 
and condition like 'cond_%';
//just a silly example you can come up with more efficient code, but I hope
//you get the gist of it.

If you find yourself always counting along the same conditions, this can speed your redesigned select count(x) from bigtable where ... up from minutes to instantly.

Johan
  • 74,508
  • 24
  • 191
  • 319