What's the best way for generating mysql reports in huge databases?

Question

I am wondering about the best way for generating mysql reports in large databases, as I have a database for POS application with more than 40 stores working on it.

The database has more that 1.5M rows for each of four tables. two for headers and other for details.

I am generating reports by joining headers with details and some other tables to get full info for the view.

I tried to archive the data in one table, which has all data required for reporting, but I found it's a huge load, and MySQL events for fetching data to that table not always working, and may lead to data loss.

I've also tried indexing tables, but didn't help too much as the queries are too big and take too much time, which lead to heavy load on the server and may stop the application with not responding at all.

I searched across google, and found some ideas about partitioning tables, and others about archiving, changing the whole engine or even upgrading server requirements.

The relation between two tables (invoice_header and invoice_detail) is (one to many) that the invoice_header is the header of an invoice, with only totals. Which is linked to invoice_detail using location ID (loc_id) and Invoice number (invo_no), as each location has its own serial number. The invoice detail contains the details of each invoice.

Sample Query:

The query takes too long (15 - 20) seconds to fetch

-Total rows: 1495873

-Total Fetched rows: 9 - 12

SELECT SUM(invoice_detail.qty) AS qty, Month(invoice_header.date) AS month
FROM invoice_detail
JOIN invoice_header ON invoice_detail.invo_no = invoice_header.invo_no
AND  invoice_detail.loc_id = invoice_header.loc_id
WHERE invoice_detail.item_id = {$itemId}
GROUP BY Month(invoice_header.date)
ORDER BY Month(invoice_header.date)

EXPLAIN:

explain

invoice_header table structure:

CREATE TABLE `invoice_header` (
 `invo_type` varchar(1) NOT NULL,
 `invo_no` int(20) NOT NULL AUTO_INCREMENT,
 `invo_code` varchar(50) NOT NULL,
 `date` date NOT NULL,
 `time` time NOT NULL,
 `cust_id` int(11) NOT NULL,
 `loc_id` int(3) NOT NULL,
 `cash_man_id` int(11) NOT NULL,
 `sales_man_id` int(11) NOT NULL,
 `ref_invo_no` int(20) NOT NULL,
 `total_amount` decimal(19,2) NOT NULL,
 `tax` decimal(19,2) NOT NULL,
 `discount_amount` decimal(19,2) NOT NULL,
 `net_value` decimal(19,2) NOT NULL,
 `split` decimal(19,2) NOT NULL,
 `qty` int(11) NOT NULL,
 `payment_type_id` varchar(20) NOT NULL,
 `comments` varchar(255) NOT NULL,
 PRIMARY KEY (`invo_no`,`loc_id`)
) ENGINE=InnoDB AUTO_INCREMENT=20286 DEFAULT CHARSET=utf8

invoice_header table structure:

invoice_detail table structure:

CREATE TABLE `invoice_detail` (
 `invo_no` int(11) NOT NULL,
 `loc_id` int(3) NOT NULL,
 `serial` int(11) NOT NULL,
 `item_id` varchar(11) NOT NULL,
 `size_id` int(5) NOT NULL,
 `qty` int(11) NOT NULL,
 `rtp` decimal(19,2) NOT NULL,
 `type` tinyint(1) NOT NULL,
 PRIMARY KEY (`invo_no`,`loc_id`,`serial`),
 KEY `item_id` (`item_id`),
 KEY `size_id` (`size_id`),
 KEY `invo_no` (`invo_no`),
 KEY `serial` (`serial`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

invoice_detail table structure:

After adding "Extract":

EXPLAIN SELECT SUM( invoice_detail.qty ) AS qty, Month( invoice_header.date ) AS
MONTH
FROM invoice_detail
JOIN invoice_header ON invoice_detail.invo_no = invoice_header.invo_no
AND invoice_detail.loc_id = invoice_header.loc_id
WHERE invoice_detail.item_id =11321
GROUP BY EXTRACT(
YEAR_MONTH FROM invoice_header.date )

I am using a quite good dedicated server with:

Intel Xeon Quad Core 3.3GHz (8 threads)
1 Gbps Uplink
16 GB RAM
1,000 GB RAID-1 Drives
25 TB Bandwidth

Any suggestions?

It's impossible to help you without more information (table structure, etc.). You must use indexes, you need to run `EXPLAIN` on each problematic query, and you may need to offload reporting to one or more slave servers. But again, without knowing what data you have, how you're doing your JOINs, and what indexes you have, we'd just be guessing about your best approach. — elixenide, Oct 18 '15 at 01:42
Would adding an index to `invoice_header.date` help? And maybe substring compare can be faster as well, see: http://stackoverflow.com/questions/508791/mysql-query-group-by-day-month-year — mevdschee, Oct 19 '15 at 21:33
@mevdschee indexing date didn't help as well, but I didn't get the substring compare you mentioned, could you pleas clarify it more? — CairoCoder, Oct 20 '15 at 00:25
Maybe the `GROUP BY EXTRACT(YEAR_MONTH FROM record_date)` will be faster as the linked post suggests. String comparison would be comparing the date with a string using string matching. Can you show the execution plan after adding the index? Still filesort? — mevdschee, Oct 20 '15 at 00:29
Ordering or grouping on a calculated field is quite expensive as that is not indexed. I guess that is the main problem. Can't you leave out the `MONTH()` function from the group and order by? NB: The following post gives some insight in optimizing your execution plan: http://use-the-index-luke.com/sql/sorting-grouping/indexed-order-by — mevdschee, Oct 20 '15 at 00:32
@mevdschee also extract example didn't help. will take a look on second link about optimization. — CairoCoder, Oct 20 '15 at 00:35
Yes, I meant `EXPLAIN SELECT ` when I referred to the execution plan. — mevdschee, Oct 20 '15 at 00:38
Just a little insight which I learned a few days ago: Having the right index alone might not be enough. Sometimes the MySQL optimizer does not choose the right index for a query. In my case it helped to tell MySQL which index to use by adding ...USE INDEX(...)... in my query. — Arno, Dec 20 '20 at 08:42

What's the best way for generating mysql reports in huge databases?

0 Answers0