0

I'll begin to try and explain my problem and what I meant with the title. Currently I have got a table with around ~8 million rows. This table is highly active, what this means is there's constant updates, inserts and deletes. These are caused by users (it's like a collecting game). Meaning I also need to make sure the data is accurately displayed.

I've looked so far into:

  • indexing
  • partitioning
  • sharding
  • mapreduce
  • optimize

I applied indexing, however I'm not sure if I applied this method correctly and it doesn't seem to help much more than I thought.

As I said, my table is highly active, meaning that if I'd add partitioning to this table, it would mean there are going to be additional inserts/deletes and make this process way more complex than I can understand. I do not have that much experience with databases.

Sharding this database is way too complex for me and I only have one service I can run this database on, so this option is a no-go.

As for mapreduce, I am not entirely sure what this does, but as far as I understood, it mainly has to do more so with the code, than with the database.

I applied optimize, but it didn't really seem to have too much effect neither as I experienced.

I have tried to not use the * in SELECT statements, I made sure to get rid of most DISTINCT, COUNT and other functionalities of SQL alike, so that these wouldn't affect the speed of the database. However even after narrowing down the data in each table and specifically this table, it's currently slower than it was before this.

This table consists of:

CREATE TABLE `claim` (
   `global_id` bigint NOT NULL AUTO_INCREMENT,
   `fk_user_id` bigint NOT NULL,
   `fk_series_id` smallint NOT NULL,
   `fk_character_id` smallint NOT NULL,
   `fk_image_id` int NOT NULL,
   `fk_gif_id` smallint DEFAULT NULL,
   `rarity` smallint NOT NULL,
   `emoji` varchar(31) DEFAULT NULL,
   PRIMARY KEY (`global_id`),
   UNIQUE KEY `global_id_UNIQUE` (`global_id`),
   KEY `fk_claim_character_id` (`fk_character_id`),
   KEY `fk_claim_image_id` (`fk_image_id`),
   KEY `fk_claim_series_id` (`fk_series_id`),
   KEY `fk_claim_user_id` (`fk_user_id`) /*!80000 INVISIBLE */,
   KEY `fk_claim_gif_id` (`fk_gif_id`) /*!80000 INVISIBLE */,
   KEY `fk_claim_rarity` (`rarity`) /*!80000 INVISIBLE */,
   KEY `fk_claim_emoji` (`emoji`),
   CONSTRAINT `fk_claim_character_id` FOREIGN KEY (`fk_character_id`) REFERENCES `character` (`character_id`) ON DELETE CASCADE ON UPDATE CASCADE,
   CONSTRAINT `fk_claim_image_id` FOREIGN KEY (`fk_image_id`) REFERENCES `image` (`image_id`) ON DELETE CASCADE ON UPDATE CASCADE,
   CONSTRAINT `fk_claim_series_id` FOREIGN KEY (`fk_series_id`) REFERENCES `series` (`series_id`) ON DELETE CASCADE ON UPDATE CASCADE,
   CONSTRAINT `fk_claim_user_id` FOREIGN KEY (`fk_user_id`) REFERENCES `user` (`user_id`) ON DELETE CASCADE ON UPDATE CASCADE
 ) ENGINE=InnoDB AUTO_INCREMENT=7622452 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

Is there possibly another solution to speed up the database? If so, how? I'm currently at wits end and stuck on it. The database needs to respond preferably within 300ms.

EXAMPLE SLOW QUERIES:

SELECT PK FROM <table> WHERE fk_user_id = ?;
SELECT PK FROM <table> WHERE fk_user_id = ? GROUP BY fk_character_id HAVING MAX(fk_character_id) = 1;
SELECT PK, fk_user_id, fk_character_id, etc, etc, etc FROM <table> WHERE fk_user_id = ? ORDER BY PK ASC LIMIT 0, 20
King Reload
  • 2,780
  • 1
  • 17
  • 42
  • What query are you running that is working below expected speed? Can you include the table structure? Can you include the output of explain select ...? – zedfoxus Mar 13 '21 at 13:06
  • 1
    8 million records are by no means big data nowadays. 8 billion would probably be. Your question is unfortunately too vague to be of any concrete help, since the usual programming answer is to use indexes. You need to show us a query that underperforms and we can help you speed it up. You also need to take a look at mysql's configuration as it can also have a big impact on performance. – Shadow Mar 13 '21 at 13:12
  • @zedfoxus almost all SELECT related statements take over 10 seconds to respond. Unless I add a LIMIT to it that is. For example getting all items from a specific user. This is no more than getting the PK with a WHERE to the FK of user. The table structure is as said in the text, I don't have problems with the FKs it references to. The output is confidential data, as its from actual users, but the output returns as you expect of the table example in the text. – King Reload Mar 13 '21 at 13:12
  • @Shadow I added some example queries as requested that I know run very slow. – King Reload Mar 13 '21 at 13:20
  • "This table consists of:" is NOT a definition of you table structure. Please post output of `SHOW CREATE TABE \G` for a (more) complete table definition.
    – Luuk Mar 13 '21 at 13:28
  • If the first one is slow, then I don't think you can do much from a programming point of view as foreign key fields are indexed. Depending on the number of records that query would return, you may look into paging (limiting the number of records returned at a time). You may have to look into mysql config instead. There already are lots of questions here on SO dealing with config, even though it is not really on topic here. So, pls do search before asking a question on config and if you do decide to ask a question, do it on the dba sister site of SO. – Shadow Mar 13 '21 at 13:28
  • @Luuk added that instead – King Reload Mar 13 '21 at 13:35
  • The first query will return all entries of a user? So about what amount of data are we talking here? Maybe not the query itself is the slow part, but the big data transfer from dB server to client? – Thallius Mar 13 '21 at 13:56
  • `PRIMARY KEY` AND `UNIQUE KEY` are similar, so you can drop `global_id_UNIQUE`. see: https://stackoverflow.com/questions/3844899/difference-between-key-primary-key-unique-key-and-index-in-mysql – Luuk Mar 13 '21 at 14:06
  • for a query with `WHERE fk_user_id = ? ORDER BY PK` I would expect a (unique) key with two fields `fk_user_id, global_id` . – Luuk Mar 13 '21 at 14:08
  • @ClausBönnhoff it only returns between 0-50k rows at best. But it still takes 17-30 seconds for it to do so. I'm at a loss and I am not good enough in databases to figure this out myself. – King Reload Mar 13 '21 at 14:13
  • Where is the server hosted? Is it a cost free one? Then this might be the problem – Thallius Mar 13 '21 at 14:14
  • @ClausBönnhoff its not cost free. I host it on a VPS. – King Reload Mar 13 '21 at 14:19
  • @Luuk - The is little advantage, and some disadvantage in making (user_id, global_id) `UNIQUE`. `INSERTs` into `UNIQUE` must synchronously check the uniqueness; `INSERTs` into `INDEX` are done asynchronously via the "Change buffer". – Rick James Mar 13 '21 at 16:11

1 Answers1

1

Redundant

PRIMARY KEY (`global_id`),
UNIQUE KEY `global_id_UNIQUE` (`global_id`),

A PRIMARY KEY, in MySQL, is a UNIQUE KEY. So the UNIQUE KEY is redundant, wastes disk space, and slows down INSERT.

Need VISIBLE index starting with user_id for Q1 and Q2

Replace this

   KEY `fk_claim_user_id` (`fk_user_id`) /*!80000 INVISIBLE */,

with

INDEX(fk_user_id, fk_character_id)

in that order -- this will help with your first 2 queries.

Query 3

The 3rd query may still need (in the given order)

INDEX(fk_user_id, global_id)

If you need some of the DISTINCTs/COUNTs, let's see them. Changing indexes may help.

Strange query

As for

SELECT PK FROM <table> WHERE fk_user_id = ?;

Why would you just want the PK? Is global_id useful by itself? Or is it useful only for looking up something else? If the latter, let's see it; it is often more practical to optimize a single, complex, query than two queries that are artificially split.

Tuning

How much RAM is available to MySQL? What is the value of innodb_buffer_pool_size? 30s for 50K rows -- sounds like being I/O-bound. Maybe that setting is too low.

In some cases, DISTINCT speeds up a query -- if for no other reason that less data is shoveled back to the client.

Redesign PK

Based on the names "claim" and "user_id" and the test for "user_id" in all 3 queries, I deduce that you are frequently looking up stuff for a single "user"? What, if anything, is global_id needed for outside this table?

If you need need global_id elsewhere or nothing else could be used for uniqueness, do

PRIMARY KEY(user_id, global_id), -- for locality of reference
INDEX(global_id)  -- to keep AUTO_INCREMENT happy

If (user_id, xx) is known to be unique (for some column(s) xx), toss global_id and change to

PRIMARY KEY(user_id, xx) 

In either case, these go away:

PRIMARY KEY (`global_id`),
UNIQUE KEY `global_id_UNIQUE` (`global_id`),
KEY `fk_claim_user_id` (`fk_user_id`) /*!80000 INVISIBLE */,

InnoDB stores the data in PK order. By having the PK start with user_id, all the rows for one user are "adjacent" on the disk, thereby more readily cached in RAM (in the buffer_pool).

Given a user with 100 claims, I am restructuring the table so that the data is found in a couple of consecutive blocks (16KB unit of storage by InnoDB) instead of upwards of 100 scattered blocks.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • 1
    you are an absolute hero! I was struggling because the DISTINCT indeed didn't seem to be able to handle the amount of data well, but once I applied the indexes you suggested, it worked like a charm. – King Reload Mar 13 '21 at 16:47