Get all available UIDs between two UIDs from a table

Question

I have a table with UID is the primary key. In the old system, it wasn't the primary key. So, people can insert data to that field, but we don't want to do it anymore.

In this table, I have a gap between UID 2000 and 2005 (2003 is taken). How do I get the list integers inside that gap?

UPDATED I actually don't want the list of consecutive numbers between 2 uids. Assuming that I may have some UIDs existing between 2 numbers but I don't know that. I just want to get the list of available UIDs between 2 UIDs

I want this list to return:

MISSING
2001
2002
2004

Related: http://stackoverflow.com/questions/186756/generating-a-range-of-numbers-in-mysql — Sumurai8, Jun 24 '15 at 20:37
And related: http://stackoverflow.com/questions/4386425/select-range-of-integers-in-mysql-eg-1-2-3-4-n — Sumurai8, Jun 24 '15 at 20:39
@DrewPierce Not, it's not troubling the system. People just want to fill up the gap because our UIDs are getting big. — lnguyen55, Jun 24 '15 at 20:46
A silly way is creating a temporary table with all numbers between a and b, then selecting all numbers in temporary table that are `NOT IN (SELECT id FROM users WHERE id BETWEEN 1 AND 100)`. — Sumurai8, Jun 24 '15 at 20:54
Can be as high as `9,223,372,036,854,775,807` like i said OCD thing — Drew, Jun 24 '15 at 21:00
So however many years of data (say n), you have n times 9 billion more years. The sun supernovas and we die way before that — Drew, Jun 24 '15 at 21:10
OK, OK, it's just an OCD thing :D. People want that, and I have no control. — lnguyen55, Jun 24 '15 at 21:11
If it is just about filling up the gaps, finding the lowest free spot might do the trick. [This question has a way](http://stackoverflow.com/questions/5016907/mysql-find-smallest-unique-id-available). The problem is that if 2 accounts are created at the same time, they could possibly end up with the same id. I wonder if the procedure in my first linked question can be altered to query the table for the existance of that particular id, but I have no experience with procedures. — Sumurai8, Jun 24 '15 at 21:13
@Sumurai8 Transactions can be used to prevent 2 accounts from trying to use the same ID. — Barmar, Jun 24 '15 at 21:15
@Barmar Using that solution on 2 million id's sounds like a nightmare to me though. — Sumurai8, Jun 24 '15 at 21:16
And I am assuming that you don't want to go through every single table, and find out where the account id is actually used. You could move the account table away, go through every row and insert it into a new account table. Then go through every other table where that id is used and change the id there too. That would work, because the new id is always <= current id, and thus every id will stay unique. It's a recipe for disaster though. — Sumurai8, Jun 24 '15 at 21:18
Actually, you could do that without moving the account table away. You just need to initiate a process that will do the moving, slowly and steadily. The last thing the process does is adjusting the auto_increment value to whatever the new value should be. You would need to invalidate any sessions that are going on to prevent those sessions using the wrong uid, and prevent logons using that uid while a new uid is assigned, but the possibility of data corruption is probably unacceptable. — Sumurai8, Jun 24 '15 at 21:35

score 2 · Accepted Answer · edited May 23 '17 at 11:43

2

See Generating a range of numbers in MySQL for how to create a table that lists all the numbers in a range. Then do:

set @start = 2000;
set @end = 2005;

SELECT n AS missing
FROM number_table AS nt
LEFT JOIN your_table AS t ON nt.n = t.uid
WHERE n BETWEEN @start AND @end
AND t.uid IS NULL

edited May 23 '17 at 11:43

Community

1
1

answered Jun 24 '15 at 21:17

Barmar

741,623
53
500
612

Don't get me wrong, but this is the last solution I want to go with. Our auto_increment step is larger than 1 (for some reason). Eventually, it will happen again. – lnguyen55 Jun 24 '15 at 21:29
What kind of solution are you looking for if this isn't it? You asked for a query to return a list of all the unused IDs in a range, this does exactly that. – Barmar Jun 24 '15 at 21:31
Yes, it solves the problem, but we have to create that table again with millions of records (uids) every time we want to find out the numbers – lnguyen55 Jun 24 '15 at 21:34
Put it in a stored procedure, to create the table with the numbers you need that time. – Barmar Jun 24 '15 at 21:36
Remember that once you have determined what the minimum number is, you can throw all numbers < minimum number. It's not that uid's disappear (all the time), right? RIGHT? – Sumurai8 Jun 24 '15 at 22:13
@Sumurai8 You are right. But I'd prefer to allow them to use range then I find out the available uids. They could never want to fill up the gap from the bottom. – lnguyen55 Jun 25 '15 at 20:19
@Barmar This solution is very close. What's about not creating a new table every time? – lnguyen55 Jun 25 '15 at 20:24
I can't think of any way to do that in SQL, it needs a table to join with. You could write a stored procedure with a loop that simply tests each number in the range with a `SELECT` query. – Barmar Jun 25 '15 at 20:43

Drew · Answer 2 · 2015-06-25T02:39:17.330

Summary of sections seen below.

Section 1 mimicks a table that has gaps in id like your question

Section 2 shows a fast way to put a 4 million row table together with incrementing pk. completely not used for this but perhaps useful. if it seems left like a half-thought it is because it was not useful entirely

Section 3 creates a table inspired by section 2 to leave you with a table of it is just a worktable where ordering in important, both to insert into it and processing it

The new id to use (the pk) Your current id (the one that is gap-prone) and a column that says whether or not has been processed so you can do them in batches section 3 is where the action is

xxxxxxxxxxxxxxxxxxxxxxxxxx

Section 1:

create table tbl1
(   // this mimicks your current table. naturally you have one already
    id bigint not null auto_increment primary key,
    thing varchar(100)  -- whatever columns you have
)engine=MyISAM;

insert tbl1(thing) values('a'),('a'),('b');
show table status from test;  -- auto_increment=4, max is 3

alter table tbl1 auto_increment=2000;
insert tbl1(thing) values('a'),('a'),('b');
alter table tbl1 auto_increment=100000; -- 100k
insert tbl1(thing) values('a'),('a'),('b');
alter table tbl1 auto_increment=110000; -- 110k
insert tbl1(thing) values('a'),('a'),('b');
alter table tbl1 auto_increment=2000000; -- 2m
insert tbl1(thing) values('a'),('a'),('b');
show table status from test;  -- auto_increment=2000003, max is 2000002

select count(*) from tbl1 -- 15 rows

xxxxxxxxxxxxxxxxxxxxxxxxxx

Section 2:

create table idFix
(   colIs bigint auto_increment primary key,    -- Is your Key
    colShouldBe bigint null    -- Should be your new Key
    processedYet tinyint null   -- 1 if processed
)engine=myisam;

insert into idFix(colIs) values(null);    -- prime it with 1 row

-- this is pretty fast, don't laugh
-- run the following line 22 times
insert into idFix(colIs) select null from idFix;

-- you now have 4.2m rows in tbl2  (4,194,304)
select count(*) from idFix

select count(*) from idFix
where colIs not in (select id from tbl1)
-- 4,194,289

xxxxxxxxxxxxxxxxxxxxxxxxxx

Section 3:

Backup data first. Then perform tests in a scratch database of the following

create table idFix2
(   yourIdShouldBe bigint auto_increment primary key,    -- This becomes what your new key should be
    yourIdIs bigint null,    -- This is what your gap-prone id is right now
    processedYet tinyint null   -- 1 if processed, null otherwise
)engine=myisam;

-- the order by is important
insert into idFix2(yourIdIs,processedYet)
select id,null from tbl1 order by id
-- again order by in above stmt is important

Now you have a table that what your key should be, what your key is, and processedYet is null.

Do them in batches in a stored proc or front end code (say java/c#, whatever)

It is important to do them top to bottom. any other way will screw up your data

Did i mention it is important to do it top to bottom?

I will leave my thought out of it about getting everyone out of system and requiring a table lock only you know your system not us.

select * 
from idFix2 
where processedYet is null and yourIdShouldBe<>yourIdIs
order by yourIdShouldBe -- order is important
limit 2 -- you might want to choose a bigger number :>

Did i mention it is important to do it top to bottom ??

Here is the flow using result set from above select stmt

(a) get next row in result set

(b) insert new parent record using data from tbl1 data back into tbl1 using data from row yourIdIs but the insert will be pk=yourIdShouldBe The insert will guarantee you won't have foreign key constraints in children tweaked below

(c) update children that use the old yourIdIs to hang under the new yourIdShouldBe in their tables (there can be scads of these tables). the children's foreign key constraints will be honored because the new parent row is in place already from step(b)

(d) delete the parent row from tbl1 where pk is yourIdIs. fear not that this will cause even more gaps because those will be filled based on looping thru (a) which will slot fill them (e) update idFix2 set processedYet=1 for the row your are processing from step (a) result set (f) GoTo (a)

When you have no more processedYet=null you are almost done

Set new auto_increment value to what it should be (1 more than max(id) in tbl1, let's call that number nnnn)

alter table tbl1 auto_increment=nnnn;

xxxxxxxxxxxxxxxxxxxxxxxxxx

Note the following

show table status from test where name like 'tbl1%';  -- auto_increment=2000003

I have nothing in slot4, 2000 will become slot 4

insert tbl1(id,thing) values(4,'stuff from record2000 you get the drift');

show table status from test where name like 'tbl1%';  -- auto_increment=2000003 is left as is

So you are free to fill the gaps without screwing with auto_increment until the end

There it is and your gaps go away. If it fails, consider taking a vacation day.

Oh, I forgot, you were testing this first in a scratch database anyway.

Good luck!

This proposed solution does not care about individual gaps, rather it focuses on fixing all yor data — Drew, Jun 25 '15 at 02:45
It seems like this solution does not help to resolve the problem with repeating procedures. — lnguyen55, Jun 25 '15 at 20:17
do you mean this line: `limit 2 -- you might want to choose a bigger number :>` that i wrote. if so, make it 3m — Drew, Jun 25 '15 at 20:19
but you test it first with a 100 rows. if you like it, make it 3m. the whole thing is done — Drew, Jun 25 '15 at 20:19
and you dont screw around with finding gaps onesie-twosie and some kludge like that. like using this gap today, finding another gap in a week. You just fix all the data and move on with life — Drew, Jun 25 '15 at 20:20
I meant this issue will happen frequently because our incremental step is not 1. So, that means there will be a hole every time new record inserted to the database. Then, they will want to fill up the gap again. — lnguyen55, Jun 25 '15 at 20:23
i think our thoughts are missing each other. the above approach removes all gaps, sets new auto_increment, and gaps won't happen again. assuming you can control the system is some fashion to say fix the data then say hey everyone log in — Drew, Jun 25 '15 at 20:25
as far as i know you are american express with millions of transactions per second which complicates issues — Drew, Jun 25 '15 at 20:26
there is nothing keeping you from creating table XYZ with the gap numbers in it. and on inserts to tbl1 you go get one of the gap numbers, mark it as used, use it in insert to tbl1. once all of XYZ is exhausted, you go back to the else part of the code that says hey, XYZ is all used up, so lets just do our normal insert the way we used to (at 2m and above). but that approach means SLOW filling of gaps — Drew, Jun 25 '15 at 20:29
it's really a no-brainer. just choose method A or B (comment above). Your OCD co-workers won't like plan B — Drew, Jun 25 '15 at 20:30

Get all available UIDs between two UIDs from a table

2 Answers2