2

my app get stuck for hours on simple queries like :

SELECT COUNT(*) FROM `item`

Context :

  • This table is around 200Gb+ and 50M+ rows.
  • We have a RDS on AWS with 2CPU and 16GiB RAM (db.r6g.large).

This is the table structure SQL dump :

/*

 Target Server Type    : MySQL
 Target Server Version : 80023
 File Encoding         : 65001

*/

SET NAMES utf8mb4;
SET FOREIGN_KEY_CHECKS = 0;
DROP TABLE IF EXISTS `item`;
CREATE TABLE `item` (
  `id` bigint unsigned NOT NULL AUTO_INCREMENT,
  `status` tinyint DEFAULT '1',
  `source_id` int unsigned DEFAULT NULL,
  `type` varchar(50) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `url` varchar(2048) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `title` varchar(500) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `sku` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `price` decimal(20,4) DEFAULT NULL,
  `price_bc` decimal(20,4) DEFAULT NULL,
  `price_original` decimal(20,4) DEFAULT NULL,
  `currency` varchar(10) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `description` text CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci,
  `image` varchar(1024) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `time_start` datetime DEFAULT NULL,
  `time_end` datetime DEFAULT NULL,
  `block_update` tinyint(1) DEFAULT '0',
  `status_api` tinyint(1) DEFAULT '1',
  `data` json DEFAULT NULL,
  `created_at` int unsigned DEFAULT NULL,
  `updated_at` int unsigned DEFAULT NULL,
  `retailer_id` int DEFAULT NULL,
  `hash` char(32) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `count_by_hash` int DEFAULT '1',
  `item_last_update` int DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `sku_retailer_idx` (`sku`,`retailer_id`),
  KEY `updated_at_idx` (`updated_at`),
  KEY `time_end_idx` (`time_end`),
  KEY `retailer_id_idx` (`retailer_id`),
  KEY `hash_idx` (`hash`),
  KEY `source_id_hash_idx` (`source_id`,`hash`) USING BTREE,
  KEY `count_by_hash_idx` (`count_by_hash`) USING BTREE,
  KEY `created_at_idx` (`created_at`) USING BTREE,
  KEY `title_idx` (`title`),
  KEY `currency_idx` (`currency`),
  KEY `price_idx` (`price`),
  KEY `retailer_id_title_idx` (`retailer_id`,`title`) USING BTREE,
  KEY `source_id_idx` (`source_id`) USING BTREE,
  KEY `source_id_count_by_hash_idx` (`source_id`,`count_by_hash`) USING BTREE,
  KEY `status_idx` (`status`) USING BTREE,
  CONSTRAINT `fk-source_id` FOREIGN KEY (`source_id`) REFERENCES `source` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=1858202585 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

SET FOREIGN_KEY_CHECKS = 1;
  1. does partitioning the table could help on a simple query like this ?
  2. do I need to increase the RAM of the RDS ? If yes what configuration do I need ?
  3. is NoSQL more compatible to this kind of structure ?
  4. Do you have any advices/solutions/fixes so the app can run those queries (we would like to keep all the data and not erase it if possible..) ?
cbdev
  • 243
  • 3
  • 12
  • Given your indexes and depending on disk speed, the query should only take some seconds. Check if you are maybe locking yourself somehow (e.g. your query waits for another transactions to finish). – Solarflare Jul 26 '22 at 12:35
  • try SELECT COUNT(id) from item , if this does not speed up things, use a WHERE clause like, SELECT COUNT(id) from item WHERE id > 0 limit 0,10 .... increase upper limit and so on just to see where you gain or loose speed – Heinz Jul 26 '22 at 12:53
  • See e.g. [Display open transactions in MySQL](https://stackoverflow.com/q/7598373). Also, another idea, does your app react to a timeout? Maybe your query actually took too long (30s, 60s?), and your app just kept waiting? – Solarflare Jul 26 '22 at 13:08
  • @Heinz MySQL "optimizes" (if you can call it that) `count(*)` on its own, so `count(*)` and `count(id)` are identical. Also, a limit in `count(id)... limit ...` has no effect (as count only returns 1 row), but trying to limit the rows read, e.g. `select count(*) from item where id < 10`, is indeed a good idea to check if the app and the query work in general. – Solarflare Jul 26 '22 at 13:12
  • @Solarflare @Heinz I tried adding a WHERE clause like this : `SELECT COUNT(*) FROM item WHERE id < 779358` (the lowest id is 779345..) and it returns immediately the answer What conclusion can I make from this ? – cbdev Jul 26 '22 at 13:20
  • @Solarflare while running `SHOW FULL PROCESSLIST` I only see one row with `SELECT COUNT(*) FROM `item``, one with `SHOW FULL PROCESSLIST` and 4 others but without any query written in the `info` column.. – cbdev Jul 26 '22 at 13:26
  • @cbdev What happens if you say SELECT COUNT(*) FROM item WHERE id <>0 – Heinz Jul 26 '22 at 13:48
  • @Heinz I tried `SELECT COUNT(*) FROM item WHERE id <>0` it's already running for 15 minutes... so no changes I guess – cbdev Jul 26 '22 at 14:19
  • You will have to do some debugging. I would first try to keep increasing the id, e.g id < 779346 (should return "1" pretty fast, if not, you have the first clue), < 1M, < 2M and so on. The time should increase about linearly with rows, and if you know how long 100k and 500k rows need (assuming it works), you could estimate how long 50M rows need. If that's an hour, there is a problem, and we have a clue. If it works for 3M rows, but not for 3.1M rows, we have a clue. If the 29s query works, but the (estimated) 31s query doesnt, we have a clue. – Solarflare Jul 26 '22 at 15:48
  • @cbdev besides what Solarflare recommends, what I would do if possible is, have someone who can analyze the query plan and/or create a copy of that table with only the primary key. Run the query and see if anything has changed for the better. If its better than step by step add the unique keys. Who knows, probaly it has to be like it is. – Heinz Jul 26 '22 at 17:21
  • @Heinz - None of those should help a simple `SELECT COUNT(*)`. That needs to scan the entire table; it does it with the "smallest" index available. `EXPLAIN SELECT ...` will probably be identical for all variants. In particular, it is probably using `INDEX(status)`, which is almost as small as `INDEX(id)` by itself. – Rick James Jul 26 '22 at 18:48
  • @cbdev - Were you doing SHOW FULL PROCESSLIST as "root"? If not, you can't see all the the running queries. – Rick James Jul 26 '22 at 18:52
  • @Solarflare - Probably none of the idea you and others are bouncing around are of any use. It simply must scan all of one BTree with `id` in it -- that includes any of the secondary indexes. The only practical one to do `WHERE id...` is the data (via the clustered PK); that is the worst BTree to use because it is the fattest. – Rick James Jul 26 '22 at 18:55
  • Hi @RickJames I ran `SHOW FULL PROCESSLIST` again as root and it still does not show anything else that `SELECT COUNT(*) FROM `item`` (using Navicat) – cbdev Jul 27 '22 at 09:32

2 Answers2

0

"SELECT COUNT(*) FROM item" needs to scan an index. The smallest index is about 200MB, so that seems like it should not take "minutes".

There are probably multiple queries that do full table scans. Such will bump out all the cached data from the ~11GB of cache (the buffer_pool) and do that about 20 times. That's a lot of I/O and a lot of elapsed time. Meanwhile, most other queries will run slowly because their cached data is being bumped out.

The resolution:

  1. Locate these naughty queries. RDS probably gives you access to the "slowlog".
  2. Grab the slowlog and run pt-query-digest or mysqldumpslow -s t to find the "worst" queries.
  3. Then we can discuss them.

There are some redundant indexes; removing them won't solve the problem. A rule: If you have INDEX(a), INDEX(a,b), you don't need the former.

If hash is some kind of scrambled value, it is likely that a single-row lookup (or update) will require a disk hit (and bump something else out of the cache).

decimal(20,4) takes 10 bytes and allows values up to 9,999,999,999,999,999.9999; that seems excessive. (Shrinking it won't save much space; something to keep in mind for the future.)

I see that AUTO_INCREMENT has reached 1.8 billion. If there are only 50M rows, does the processing do a lot of DELETEs? Or maybe REPLACE``? IODKU is better than REPLACE`.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • Thanks for your answer, I think i can't connect to RDS via SSH so I dont know how I can run `pt-query-digest` or `mysqldumpslow -s t` but I have enabled the AWS slow query log. What you said about cached data bumped out could explain the problem, but I don't see any other queries in `SHOW FULL PROCESSLIST` or in AWS slow query logs.. Thanks about the fields/indexes optimisations I will correct them. Finally, yes we are making a lot of DELETE, can this slow the queries ? – cbdev Jul 27 '22 at 10:56
  • @cbdev - RDS may have a UI to let you download the slowlog. Then you can do the digesting on your machine. A single-row delete (assuming an adequate index) will not have much impact, but it will hit the disk several times and it will need at least one data block to work on. A million-row `DELETE` is terribly invasive. – Rick James Jul 28 '22 at 04:46
0

Thanks for all the advices here, but the problem was that we were using the MySQL json type for a very heavy column. Removing this column or even changing it to varchar made the COUNT(id) around 1000x faster (also adding WHERE id > 1 helped..)

Note : it was impossible to just delete the column as it was, we had to change it to varchar before.

cbdev
  • 243
  • 3
  • 12