I have a table in which approx 100,000 rows are added every day. I am supposed to generate reports from this table. I am using PHP to generate these reports. Recently the script which used to do this is taking too long to complete. How can I improve the performance by shifting to something else than MYSQL which is scalable in the long run.
-
4Is the table correctly indexed? – d.syph.3r Apr 21 '11 at 12:50
-
In addition to @d.syph.3r's question - can you give some indication of the type of data that's being stored in the volume you describe - are all of the rows essentially going into one table, or is the data going into a number of different tables? – Gary Barnett Apr 21 '11 at 12:54
-
What is the engine of the table in question, what are your configuration settings of your MySQL instance, are you using PHP to actually calculate the stuff from the dataset or you do the report calculation directly in the MySQL itself, what does your EXPLAIN SELECT output, what does your SHOW CREATE TABLE output? Without these answers, we can just guess blindly. – Michael J.V. Apr 21 '11 at 12:59
-
hi,everything is in a single table,MYISAM, data is transaction record it has an ID, amount, timestamp and a code representing the location from where transaction is done. – user557348 Apr 22 '11 at 04:08
8 Answers
MySQL is very scalable, that's for sure.
The key is not changing the db from Mysql to other but you should:
- Optimize your queries (can sound silly for others but I remember for instance that a huge improvment I've done sometime ago is to change
SELECT *
into selecting only the column(s) I need. It's a frequent issue I meet in others code too) - Optimize your table(s) design (normalization etc).
- Add indexes on the column(s) you are using frequently in the queries.
Similar advices here

- 1
- 1

- 9,603
- 14
- 91
- 162
For generating reports or file downloads with large chunks of data you should concider using flush and increasing time_limit and memory limit.
I doubt the problem lies in the amount of rows, since MySQL can support ALOT of rows. But you can of course fetch x rows a time and process them in chunks.
I do assume your MySQL is properly tweaked for performance.

- 14,888
- 4
- 41
- 59
First analyse why (or: whether) your queries are slow: http://dev.mysql.com/doc/refman/5.1/en/using-explain.html

- 95,432
- 20
- 163
- 226
You should read the following and learn a little bit about the advantages of a well designed innodb table and how best to use clustered indexes - only available with innodb !
The example includes a table with 500 million rows with query times of 0.02 seconds.
MySQL and NoSQL: Help me to choose the right one
Hope you find this of interest.
Another thought is to move records beyond a certain age to a historical database for archiving, reporting, etc. If you don't need that large volume for transactional processing it might make sense to extract them from the transactional data store.
It's common to separate transactional and reporting databases.

- 305,152
- 44
- 369
- 561
-
hi, this table is having transaction data (ID, amount, timestamp and a code representing the location from where transaction is done). "It's common to separate transactional and reporting databases." how? – user557348 Apr 22 '11 at 04:14
I am going to make some assumptions
- Your 100k rows added every day have timestamps which are either real-time, or are offset by a relatively short amount of time (hours at most); your 100k rows are added either throughout the day or in a few big batches.
- The data are never updated
- You are using InnoDB engine (Frankly you would be insane to use MyISAM for large tables because in the event of a crash, index rebuild takes a prohibitive time)
You haven't explained what kind of reports you're trying to generate, but I'm assuming that your table looks like this:
CREATE TABLE logdata (
dateandtime some_timestamp_type NOT NULL,
property1 some_type_1 NOT NULL,
property2 some_type_2 NOT NULL,
some_quantity some_numerical_type NOT NULL,
... some other columns not required for reports ...
... some indexes ...
);
And that your reports look like
SELECT count(*), SUM(some_quantity), property1 FROM logdata WHERE dateandtime BETWEEEN some_time_range GROUP BY property1;
SELECT count(*), SUM(some_quantity), property2 FROM logdata WHERE dateandtime BETWEEEN some_time_range GROUP BY property2;
Now, as we can see, both of these reports are doing a scan of a large amount of the table, because you are reporting on a lot of rows.
The bigger the time range becomes the slower the reports will be. Moreover, if you have a lot of OTHER columns (say some varchars or blobs) which you aren't interested in reporting on, then they slow your report down too (because the server still needs to inspect the rows).
You can use several possible techniques for speeding this up:
- Add covering index for each type of report, to support the columns you need and omit columns you don't. This may help a lot but slow inserts down.
- Summarise data according to the dimension(s) that you want to report on. In this ficticious case, all your reports are either counting rows, or SUM()ing some_quantity.
- Build mirror tables (containing the same data) which have appropriate primary keys / indexes/ columns to make the reports faster.
- Use a column engine (e.g. Infobright)
Summarisation is usually an attractive option if your use-case supports it;
You may wish to ask a more detailed question with an explanation of your use-case.

- 62,604
- 14
- 116
- 151
-
thanks mark, http://stackoverflow.com/questions/5762813/adding-composite-indexes-on-mysql-table – user557348 Apr 23 '11 at 07:12
The time limit can be temporarily turned off for a particular file if you know that it is going to potentially run over the time limit by calling set_time_limit (0);
at the start of your script.
Other considerations such as indexing or archiving very old data to a different table should also be looked at.

- 3,689
- 3
- 24
- 35
Your best bet is something like MongoDB or CouchDB, both of which are non-relational databases oriented toward storing massive amounts of data. This is assuming that you've already tweaked your MySQL installation for performance and that your situation wouldn't benefit from parallelization.

- 5,535
- 2
- 27
- 43
-
Not sure you can come to that conclusion on the basis of the original question. It depends very much on the type of data that's being stored, surely? – Gary Barnett Apr 21 '11 at 12:55
-
The issue in this case is not massive amounts of data... 100,000 rows is a small volume of data, and is only a problem if the table isn't well indexed, or the queries are badly optimized. – Mark Baker Apr 21 '11 at 12:56
-
He said 100,000 rows / day. If my experience of MySQL is relevant, at that rate after a year the table is going to start performing very poorly (36 million rows). – Mike Axiak Apr 21 '11 at 13:02
-
Unless you partition the table. The very same thing NoSQL based solutions do with larger volumes of data. – Michael J.V. Apr 21 '11 at 13:22
-
you will note that i specified that my advice was only valid if the poster has already optimized his mysql configuration. – Winfield Trail Apr 21 '11 at 13:29
-
1There are plenty of billion-row plus mysql databases out there. See- http://serverfault.com/questions/80334/more-than-1-billion-rows-in-a-myisam-table – Gary Barnett Apr 21 '11 at 13:32
-
He said "scalable in the long run," which (if he's getting 100k new rows daily as he said) means he is indeed going to outrun MySQL's ability to keep up eventually. – Winfield Trail Apr 21 '11 at 13:36
-
its 100k now but it can reach a stage where it can grow at 2-3 million rows daily. – user557348 Apr 22 '11 at 04:12
-
@User: Then I restate my suggestion: look into MongoDB or CouchDB. Good luck. – Winfield Trail Apr 22 '11 at 04:18
-
I am pretty sure this is not a reasonable answer; mongoDB and couchDB don't support the kind of queries which can do what the OP requires, hence require more code and there's no guarantee it will be any faster, but the OP will need to rewrite their entire app. – MarkR Apr 22 '11 at 19:06
-
From the OP's original question: *How can I improve the performance **by shifting to something else than MYSQL** which is scalable in the long run* – Winfield Trail Apr 22 '11 at 22:00