We have a database table containing about 600 million rows of organisation records. One organisation may have multiple rows in the records table which is where key status changes are logged for the organisations.
We currently have a SQL query running in a loop because it takes so long to run and we need to information updated to a cache.
I'm looking for help here for any suggestions that might improve the efficiency of the query. Our biggest concern is the 50+ TB that the query writes to the SSD's every day, and our next concern is how long the query takes to run. I'd be very happy if either one of our concerns could be addressed.
This is the query:
SELECT DISTINCT organisation_id
FROM records AS a
WHERE a.status != 'private'
AND NOT EXISTS (
SELECT 1
FROM records
WHERE status = 'insolvent' AND a.organisation_id = organisation_id);
The status column is a ENUM. The organisation_id column is a INT There are other columns but I don't think its relevant to this SQL query.
This are the MySQL configurations:
tmp_table_size = 8589934590
max_heap_table_size = 8589934590
temptable_max_ram = 8589934590
innodb_flush_log_at_trx_commit = 1
innodb_stats_on_metadata = OFF
innodb_buffer_pool_size = 268435456
innodb_buffer_pool_chunk_size = 268435456
innodb_buffer_pool_instances = 1
innodb_log_file_size = 50331648
innodb_log_buffer_size = 16777216
innodb_page_size = 16384
innodb_file_per_table = ON
innodb_flush_method = O_DIRECT
innodb_thread_concurrency = 0
innodb_read_io_threads = 4
innodb_write_io_threads = 4
We recently upgraded from an old server with 8GB of RAM to one with 32GB. I’m doing testing on my MacBook and the live server hasn’t had any configuration changes yet. @Rick James is right in pointing out that these settings are screwed up. Let me first explain the tmp_table_size, max_heap_table_size, and temptable_max_ram settings; this is something which I had tried in the settings of my MacBook where I’m running tests and it was something I had hoped would reduce high disk writes by using in RAM temp table instead of an on-disk temp table. I kept the tmp_table_size, max_heap_table_size, and temptable_max_ram settings at 8GB so that if anybody was going to make this suggestion then they could already see I’ve tried increasing it enormously.
With regards to the huge 50+ TB of writes per day, this is a metric being reported by Activity Monitor on my laptop which we were not aware was happening until I started trying to do some test optimisations with a copy of the data on my MacBook. For the query to complete just once it wrote about 3.5 TB to the disk. The disk usage seemed to go in a pattern of using up a couple of GB, then removing a couple of GB, and repeat.
CREATE TABLE `records` (
`id` int unsigned NOT NULL AUTO_INCREMENT,
`status` enum('private','insolvent', ...44more) NOT NULL,
`organisation_id` int unsigned NOT NULL,
`report_id` int unsigned NOT NULL,
`datetime` int unsigned NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1113992619 DEFAULT CHARSET=utf8 row_format=COMPRESSED
Row format compressed is just something we did as a temporary measure to save space on our old server since this is a side project and the database was mostly ingesting data for the first 3 years.
You’ll notice that the auto increment is up to 1.1 billion but above I said 600 million, this is because I simply couldn’t fit all the data onto my MacBook so only the last 1.5 years of data is on my laptop for testing.
Need any more info provided? Ask me.