1

We have a database table containing about 600 million rows of organisation records. One organisation may have multiple rows in the records table which is where key status changes are logged for the organisations.

We currently have a SQL query running in a loop because it takes so long to run and we need to information updated to a cache.

I'm looking for help here for any suggestions that might improve the efficiency of the query. Our biggest concern is the 50+ TB that the query writes to the SSD's every day, and our next concern is how long the query takes to run. I'd be very happy if either one of our concerns could be addressed.

This is the query:

SELECT DISTINCT organisation_id 
FROM records AS a 
WHERE a.status != 'private' 
AND NOT EXISTS (
    SELECT 1 
    FROM records 
    WHERE status = 'insolvent' AND a.organisation_id = organisation_id);

The status column is a ENUM. The organisation_id column is a INT There are other columns but I don't think its relevant to this SQL query.

This are the MySQL configurations:

tmp_table_size = 8589934590
max_heap_table_size = 8589934590
temptable_max_ram = 8589934590
innodb_flush_log_at_trx_commit = 1
innodb_stats_on_metadata = OFF
innodb_buffer_pool_size = 268435456
innodb_buffer_pool_chunk_size = 268435456
innodb_buffer_pool_instances = 1
innodb_log_file_size = 50331648
innodb_log_buffer_size = 16777216
innodb_page_size = 16384
innodb_file_per_table = ON
innodb_flush_method = O_DIRECT
innodb_thread_concurrency = 0
innodb_read_io_threads = 4
innodb_write_io_threads = 4

We recently upgraded from an old server with 8GB of RAM to one with 32GB. I’m doing testing on my MacBook and the live server hasn’t had any configuration changes yet. @Rick James is right in pointing out that these settings are screwed up. Let me first explain the tmp_table_size, max_heap_table_size, and temptable_max_ram settings; this is something which I had tried in the settings of my MacBook where I’m running tests and it was something I had hoped would reduce high disk writes by using in RAM temp table instead of an on-disk temp table. I kept the tmp_table_size, max_heap_table_size, and temptable_max_ram settings at 8GB so that if anybody was going to make this suggestion then they could already see I’ve tried increasing it enormously.

With regards to the huge 50+ TB of writes per day, this is a metric being reported by Activity Monitor on my laptop which we were not aware was happening until I started trying to do some test optimisations with a copy of the data on my MacBook. For the query to complete just once it wrote about 3.5 TB to the disk. The disk usage seemed to go in a pattern of using up a couple of GB, then removing a couple of GB, and repeat.

CREATE TABLE `records` (
  `id` int unsigned NOT NULL AUTO_INCREMENT,
  `status` enum('private','insolvent', ...44more) NOT NULL,
  `organisation_id` int unsigned NOT NULL,
  `report_id` int unsigned NOT NULL,
  `datetime` int unsigned NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1113992619 DEFAULT CHARSET=utf8 row_format=COMPRESSED

Row format compressed is just something we did as a temporary measure to save space on our old server since this is a side project and the database was mostly ingesting data for the first 3 years.

You’ll notice that the auto increment is up to 1.1 billion but above I said 600 million, this is because I simply couldn’t fit all the data onto my MacBook so only the last 1.5 years of data is on my laptop for testing.

Need any more info provided? Ask me.

  • Would it maybe make sense to have a separate table for each status? – Christian Proetti Feb 25 '20 at 02:18
  • If you have only 600M rows, 50TB/day sounds like a typo. – Rick James Feb 25 '20 at 07:17
  • How many records for each organization? If only one, then that query can be greatly simplified. – Rick James Feb 25 '20 at 07:19
  • Please provide `SHOW CREATE TABLE records` and `SHOW TABLE STATUS LIKE 'records';` – Rick James Feb 25 '20 at 07:20
  • How much RAM do you have? Those settings look screwed up. – Rick James Feb 25 '20 at 07:22
  • The 50TB its writing is to a temp table while the query is running in a loop to update the cache. That much data isn’t actually being stored. – Christian Proetti Feb 26 '20 at 01:55
  • Has 32GB ram in the system. I was increasing the first 3 settings in the hopes it wouldn’t be writing to the disk for the temp table but I think the documentation says that the sub queries in my query will always use a on disk temp table. – Christian Proetti Feb 26 '20 at 02:00
  • 50TB = 50,000,000,000,000. What metric is giving you that value? I don't think any SSD can handle that many bytes in one day. Or even one week. – Rick James Feb 26 '20 at 21:56
  • The huge writes to the disk is not something we were aware of or at least not to this extent until I copied the database over to my MacBook and tried some ways to optimise it. To answer your question about “What metric is giving you this value” actually, I just happened to notice it in Activity Monitor on my MacBook while running the query in my post. For the query to complete just once it wrote about 3.5 TB to the disk. The disk usage seemed to go in a pattern of using up a couple of GB, then removing a couple of GB, and repeat. – Christian Proetti Feb 27 '20 at 02:18
  • Have you added the suggested indexes? – Rick James Feb 28 '20 at 00:37
  • No, I was only doing one thing at a time. The first thing I tried was running the query with the LEFT JOIN IS NULL method that @Barmar suggested but that was running for 30 hours before I gave up and stopped it. Now I and move onto the 2nd suggestion which is adding the multi-column index that both you and Barmer have suggested. – Christian Proetti Feb 28 '20 at 02:54

3 Answers3

2

Try using the LEFT JOIN/NULL pattern instead of NOT EXISTS.

SELECT DISTINCT a.organisation_id
FROM records AS a
LEFT JOIN records AS b ON a.organisation_id = b.organisation_id AND b.status = 'insolvent'
WHERE a.status != private AND b.organisation_id IS NULL

Also, it will probably help if you have a multi-column index on (organisation_id, status).

Barmar
  • 741,623
  • 53
  • 500
  • 612
  • Can you please explain the last part “AND b.organisation_id IS NULL” as I don’t understand that. – Christian Proetti Feb 25 '20 at 02:38
  • That's how you tell when a LEFT JOIN doesn't match, since the columns from the second table are all NULL. See https://stackoverflow.com/questions/21633115/return-row-only-if-value-doesnt-exist?lq=1 – Barmar Feb 25 '20 at 02:43
  • This is a standard pattern that all SQL programmers should become familiar with. – Barmar Feb 25 '20 at 02:43
  • Thank you. The query is running now on a copy of the database. I think the IS NULL is where I went wrong when trying the LEFT JOIN myself. – Christian Proetti Feb 25 '20 at 02:47
  • @ChristianProetti - The index mentioned would also help with the `NOT EXISTS` in your Question. – Rick James Feb 26 '20 at 18:04
  • @Barmer I had tried your query when you posted this answer however its now been running for 60 hours and still not finished. I did not add a multi-column index yet, but neither did I have one when using the NOT EXISTS method. Is there something wrong that you can think of as to why the query has not finished after 60 hours? – Christian Proetti Feb 27 '20 at 23:56
1

With 32GB of RAM, you should probably set innodb_buffer_pool_size to 25G, not 256M. This may help. Raising those other 3 settings to 8G is dangerous; they should not be more than, say, 256M.

running in a loop

Please elaborate.

You need INDEX(status, organisation_id), in this order.

Please provide EXPLAIN SELECT ... and SHOW CREATE TABLE records.

8GB

If you are limiting MySQL to 8GB, set

innodb_buffer_pool_size to 70% of 8GB
the first 3 settings to 1% of 8GB
leave everything else at the defaults

Such settings should be a decent compromise between "efficient" and "safe". Most sites don't need to do further tuning.

Swapping is especially bad for MySQL. So if you need to shrink MySQL to avoid swapping, adjusting the buffer_pool is the quick and easy way to make room.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • On the live server tmp_table_size, max_heap_table_size, and temptable_max_ram settings are not defined so they are just whatever the default is and I’d be happy if you can make a recommendation of what these settings should be set to. With regards to the other settings I showed, these too will need to be adjusted since we’ve moved from a 8GB RAM server to one with 32GB RAM however we only want MySQL to be using no more than 8GB of ram since we have some other heavy applications running. – Christian Proetti Feb 27 '20 at 02:25
  • With regards to my comment about the query running in a loop. In our application we have it indefinitely running this query and updating the cache, when the application needs data for serving a user request it loads the data from the cache rather than making the user sit there waiting for a possibly multiple hour long query to complete. The query in my initial post is just one of the many queries which are needing to be run in a loop and updated to a cache, this was just the one that stood out the most as taking so long. – Christian Proetti Feb 27 '20 at 02:26
  • @ChristianProetti - I added to my answer. What implements the "cache"? – Rick James Feb 27 '20 at 06:00
  • Serving user requests such as web pages, user-facing control panel, API queries, etc… run on Cloudflare Workers. Theres a hidden backend thats one of our servers running PHP and MySQL which the workers communicate with in order to update the data stored on Cloudflare’s end. The “cache” is really just a KV Store (json file) containing the structured results of various queries. One such cached results is for this query where the data is renewed in a loop so its usually no older than a few hours. – Christian Proetti Feb 28 '20 at 00:09
0

Without knowing the rate of record inserts into your database, it MIGHT make sense to handle all this through database triggers on insert/update/delete.

Here is what I am proposing. Have a separate table that has nothing but organization_id column. Maybe call the table something like NonPrivateInsolvent (exaggerating name, but just to follow).

Then a trigger for an insert, update, delete, basically something like... (syntax not correct as I am not at a MySQL machine to test.

DELIMITER //
DROP TRIGGER IF EXISTS MyCacheTrigger//

CREATE DEFINER=root@localhost TRIGGER MyCacheTrigger
    AFTER UPDATE ON `records`
    FOR EACH ROW
BEGIN
   -- the "NEW" refers to the new record value coming in so you'll have
   -- the ID just being tested... run the query something like

SELECT DISTINCT organisation_id 
FROM records AS a 
WHERE a.organisation_id = NEW.organisation_id
  AND a.status != 'private' 
  AND NOT EXISTS ( SELECT 1 
                      FROM records 
                      WHERE status = 'insolvent' AND a.organisation_id = NEW.organisation_id);


-- The query will only be filtered on the single organization ID and should be very fast.
-- if any records ARE returned, insert into the NonPrivateInsolvent table the given
-- organization ID.  if not, delete from NonPrivateInsolvent table.

END//

DELIMITER ;

(similar for insert/update/delete)

Now, all you have to do is query directly from the NonPrivateInsolvent table and it should be immediate (relatively). It may take a while to do an initial load of this table from all your existing data, but then after that might help your overall performance.

The benefit of doing this as a table from the triggers is that it will be self-maintained, but only check for those organizations IMMEDIATELY when something happens to that organization. If you have 1M organizations and only 350K are actively getting hit, you never have to keep requerying all 1M as they will already be in the table waiting.

Now, you could even do a step further and add a date column to this NonPrivateInsolvent table for last activity so the record could be stamped with the time as well. So if you wanted, you could do follow-up such as on any account with no activity in last 6 months for example...

Don't know your data, origin purposes, etc. But with the volume of data this might be an option to consider. I will try to update sample insert/update/delete at a later time when at a machine able to do so.

Looking at your reference of records 600M for 1.5 years is about 13 records per second based on 60 sec per minute * 60 minutes per hour * 24 hrs per day * ( 365 days * 1.5 yrs ). Without explicits… what type of data is this gathering at such a high rate of inserts per second.

Another options might be like a query you could run every hour maybe on some timer/schedule option and store whatever the "last ID", or last time executed so you can pickup from that point on. Even at an hour would be 60sec * 60min * 13recs per second = less than 50k records. So a stored procedure to query against 50000 records might be a better option.

If the otherwise time of querying is done would take hours, these smaller segments might be a better avenue to keep current. Maybe a combination of these options might help to always keep your "latest" qualified accounts as current, such as within the last hour.

DRapp
  • 47,638
  • 12
  • 72
  • 142