2

The case:

I have been developing a web application in which I storage data from different automated data sources. Currently I am using MySQL as DBMS and PHP as programming language on a shared LAMP server.

I use several tables to identify the data sources and two tables for the data updates. Data sources are in a three level hierarchy, and updates are timestamped.

One table contains the two upper levels of the hierarchy (geographic location and instrument), plus the time-stamp and an “update ID”. The other table contains the update ID, the third level of the hierarchy (meter) and the value.

Most queries involve a joint statement between this to tables.

Currently the first table contains near 2.5 million records (290 MB) and the second table has over 15 million records (1.1 GB), each hour near 500 records are added to the first table and 3,000 to the second one, and I expect this numbers to increase. I don't think these numbers are too big, but I've been experiencing some performance drawbacks.

Most queries involve looking for immediate past activity (per site, per group of sites, and per instrument) which are no problem, but some involve summaries of daily, weekly and monthly activity (per site and per instrument). The page takes several seconds to load, sometimes surpassing the server's timeout (30s).

It also seems that the automatic updates are suffering from these timeouts, causing the connection to fail.

The question:

Is there any rational way to split these tables so that queries perform more quickly? Or should I attempt other types of optimizations not involving splitting tables?

(I think the tables are properly indexed, and I know that a possible answer is to move to a dedicated server, probably running something else than MySQL, but just yet I cannot make this move and any optimization will help this scenario.)

  • 1
    You can try partition the tables, if the queries you are doing don't have to get results from the entire tables, but only from a specific range, for example for particular day. But it's more a workaround (at most) than a permanent relief, plus there is a limit to a number of partitions you can have. For more long-term, you might want to look at more scalable sharding strategies. – Ashalynd Oct 16 '13 at 12:41
  • 1
    Any database migration should be done as soon as possible. – Aron Oct 16 '13 at 12:41

1 Answers1

0

If the queries that are slow are the historical summary queries, then you might want to consider a Data Warehouse. As long as your history data is relatively static, there isn't usually much risk to pre-calculating transactional summary data.

Data warehousing and designing schemas for Business Intelligence (BI) reporting is a very broad topic. You should read up on it and ask any specific BI design questions you may have.

Joel Brown
  • 14,123
  • 4
  • 52
  • 64