Predicting the primary keys exhaustion in MySQL based on rate of insertion

Question

We are running out of primary keys for our table in prod. We do not want that to occur again. So we need a mechanism to alert us when we are running out of primary keys based on the rate of insertion so that we can mitigate that.

I have written a Jenkins job that runs and executes a query on Information schema and extracts the index length, table rows and auto_increment from the MySQL database. But we need to keep track of the previous reading since we want to take diff of it with the current in order to predict when the keys are going to be exhausted.

        SELECT 
        INDEX_LENGTH, TABLE_NAME, AUTO_INCREMENT, TABLE_ROWS 
        FROM 
        information_schema.TABLES 
        ORDER BY 
        index_length DESC;

I cannot find anything related to the rate of change in Mysql tables that should predict the rate at which the primary keys are being used up. I am also open to use any good alternate approach if anyone thinks of one but keep in mind I have to get a query run by a cron job or Jenkins job that should tell that which tables will be exhausted at the end of this year.

If you have to worry about running out of PK values, I'd say you probably want a bigger PK field. If that is not feasible for some reason... why not just insert the results of the query you've provided into a log table, along with a timestamp for when the data was pulled, that you can analyze for growth over time. — Uueerdo, May 30 '19 at 22:02
Does your table have a `created_at` timestamp column? If so, you could calculate the difference between the MAX() and MIN() in that column. Or you could get the rate of rows inserted during any specified time period. — Bill Karwin, May 30 '19 at 22:30
@Uueerdo, yep that is one way but that is overhead. If I can get to write a SQL query that could do it for me then it would be the best way to go. I am working on constructing a similar query but only taking the diff is an issue as there is no column to track that. Making another table in prod or elsewhere is a bit overhead. We can also put metrics into cloudwatch or any other storage to take the diff as well then. — ali, May 31 '19 at 09:14
@BillKarwin I want to get it from information schema since we have around 40 DBs and in each deployment and we have multiple deployments so gettting to each table and then get MAX and MIN will take a lot of time. My focus is to get info from information schema since it has knowledge of all the DBs in it. — ali, May 31 '19 at 09:17
Well, the information schema only records the current auto-increment value, not a history of PK values per time period. The information schema just doesn't have the information you want. — Bill Karwin, May 31 '19 at 15:07
@BillKarwin I cannot post the query in here so pasting it in the answer below. Check it out. — ali, Jun 03 '19 at 07:14
@BillKarwin There are two queries in the link I posted in the answer (in the comments section of the post). I ran the second query since the first one was prompting for some input I guess so I was unable to run it. — ali, Jun 03 '19 at 07:19
Yes, you can find out how much of an integer you have used. I wrote [a similar query](https://github.com/billkarwin/bk-tools/blob/master/pk-full-ratio.sql) once. But that doesn't give you an estimate for the time you will run out. For that, you need to know the rate of usage. — Bill Karwin, Jun 03 '19 at 14:58
If you only need to know how many values in the range of an integer you have used, and not the rate you're consuming them, then this question is a duplicate of https://stackoverflow.com/q/55064438/20860 — Bill Karwin, Jun 03 '19 at 15:00

score 1 · Answer 1 · answered Jun 03 '19 at 07:16

Found related query to get the percentage of auto increments used up. See this link and in the comments section.

SELECT table_schema, table_name, column_name, column_type, auto_increment,max_int,ROUND(auto_increment/max_int*100,2) AS “USED %” FROM
    (SELECT table_schema, table_name, column_name, column_type, auto_increment,
    pow(2, case data_type
    when ‘tinyint’ then 7
    when ‘smallint’ then 15
    when ‘mediumint’ then 23
    when ‘int’ then 31
    when ‘bigint’ then 63
    end+(column_type like ‘% unsigned’))-1 as max_int
    FROM information_schema.tables t
    JOIN information_schema.columns c USING (table_schema,table_name)
    WHERE t.table_schema not in (‘mysql’,’information_schema’,’performance_schema’) and t.table_type = ‘base table’
    and c.extra = ‘auto_increment’ AND t.auto_increment IS NOT NULL order by auto_increment desc limit 10) TMP;

score 0 · Answer 2 · answered May 31 '19 at 16:40

You could simply create a table to store metrics, the values you are currently retrieving along with a timestamp

    INSERT INTO table_metrics(index_length, table_name, auto_increment, row_count, when)
    SELECT INDEX_LENGTH, TABLE_NAME, AUTO_INCREMENT, TABLE_ROWS, now()
    FROM information_schema.TABLES 
    ;

and then run a query on that data

SELECT table_name
    , il_change / daysPassed AS ilGrowthRate
    , ([literal ceiling value] - il_current) / (il_change / daysPassed) AS daysRemaining
    , ...
FROM (
SELECT table_name
    , MAX(index_length) AS il_current
    , MAX(index_length) - MIN(index_length) AS il_change
    , ....
    , DATEDIFF(MAX(when), MIN(when)) AS daysPassed
GROUP BY table_name
) AS subQ

This assumes you remove the data for a table when you truncate it so that the metrics start over when the table does; you can of course add conditions to only look at metrics from the last day, week, month, etc... (in which case pruning the metrics table after clears becomes less important).

Predicting the primary keys exhaustion in MySQL based on rate of insertion

2 Answers2