4

I am trying to get a sequential number table from 1 to 20 million. (or 0 to 20 million)

I am rather awestruck at how difficult it's been to get a MySQL-compatible solution to this common problem.

Similar to this: Creating a "Numbers Table" in MySQL

But the answer only goes to 1 million. I am not really understanding the bit shift calculations.

I've seen many SQL answers but most are for databases that aren't MySQL, so I can't adopt the code due to lack of knowledge of both MySQL and the other.

Some references:

SQL, Auxiliary table of numbers

What is the best way to create and populate a numbers table?


Please make sure the code you post is compatible in MySQL and is semicolon delimited so I can run it in PhpMyAdmin. I'd appreciate the table to be named numbers with the column called i

I will benchmark each solution, so the it's archived and hopefully will show up for the next time someone tries to search for this problem.


Benchmarks thus far:

Times are in seconds.

+---------------+------------------+---------+-----------+------------+
|    Author     |      Method      | 10,000  | 1,000,000 | 20,000,000 |
+---------------+------------------+---------+-----------+------------+
| Devon Bernard | PHP Many Queries | 0.38847 | 39.32716  | ~ 786.54   |
| Bhare         | PHP Few Queries  | 0.00831 | 0.94738   | 19.58823   |
| psadac,Bhare  | LOAD DATA        | 0.00549 | 0.43855   | 10.55236   |
| kjtl          | Bitwise          | 1.36076 | 1.48300   | 4.79226    |
+---------------+------------------+---------+-----------+------------+
HoldOffHunger
  • 18,769
  • 10
  • 104
  • 133
ParoX
  • 5,685
  • 23
  • 81
  • 152
  • I don't think I'm going to be able to beat 10.5 - although generating the lines elsewhere sounds like cheating to me! ;-) – Strawberry Jan 13 '13 at 00:21

8 Answers8

3
-- To use the bitwise solution you need a view of 2 to the power 25.
-- the following solution is derived from http://stackoverflow.com/questions/9751318/creating-a-numbers-table-in-mysql
-- the following solution ran in 43.8 seconds with the primary key, without it 4.56 seconds.

-- create a view that has 2 to the power 25 minus 1

-- 2 ^ 1
CREATE or replace VIEW `two_to_the_power_01_minus_1` AS select 0 AS `n` union all select 1 AS `1`;

-- 2 ^ 2
CREATE or replace VIEW `two_to_the_power_02_minus_1` 
AS select
   ((`hi`.`n` << 1) | `lo`.`n`) AS `n`
from (`two_to_the_power_01_minus_1` `lo` join `two_to_the_power_01_minus_1` `hi`) ;

-- 2 ^ 4
CREATE or replace VIEW `two_to_the_power_04_minus_1` 
AS select
   ((`hi`.`n` << 2 ) | `lo`.`n`) AS `n`
from (`two_to_the_power_02_minus_1` `lo` join `two_to_the_power_02_minus_1` `hi`) ;

-- 2 ^ 8
CREATE or replace VIEW `two_to_the_power_08_minus_1` 
AS select
   ((`hi`.`n` << 4 ) | `lo`.`n`) AS `n`
from (`two_to_the_power_04_minus_1` `lo` join `two_to_the_power_04_minus_1` `hi`) ;

-- 2 ^ 12
CREATE or replace VIEW `two_to_the_power_12_minus_1` 
AS select
   ((`hi`.`n` << 8 ) | `lo`.`n`) AS `n`
from (`two_to_the_power_08_minus_1` `lo` join `two_to_the_power_04_minus_1` `hi`) ;

-- 2 ^ 13
CREATE or replace VIEW `two_to_the_power_13_minus_1`
AS select
   ((`hi`.`n` << 1) | `lo`.`n`) AS `n`
from (`two_to_the_power_01_minus_1` `lo` join `two_to_the_power_12_minus_1` `hi`);



-- create a table to store the interim results for speed of retrieval
drop table if exists numbers_2_to_the_power_13_minus_1;

create table `numbers_2_to_the_power_13_minus_1` (
  `i` int(11) unsigned
) ENGINE=myisam DEFAULT CHARSET=latin1 ;

-- faster 2 ^ 13
insert into numbers_2_to_the_power_13_minus_1( i )
select n from `two_to_the_power_13_minus_1` ;

-- faster 2 ^ 12
CREATE or replace view `numbers_2_to_the_power_12_minus_1`
AS select
   `numbers_2_to_the_power_13_minus_1`.`i` AS `i`
from `numbers_2_to_the_power_13_minus_1`
where (`numbers_2_to_the_power_13_minus_1`.`i` < (1 << 12));

-- faster 2 ^ 25
CREATE or replace VIEW `numbers_2_to_the_power_25_minus_1`
AS select
   ((`hi`.`i` << 12) | `lo`.`i`) AS `i`
from (`numbers_2_to_the_power_12_minus_1` `lo` join `numbers_2_to_the_power_13_minus_1` `hi`);

-- create table for results

drop table if exists numbers ;

create table `numbers` (
  `i` int(11) signed 
  , primary key(`i`)
) ENGINE=myisam DEFAULT CHARSET=latin1;

-- insert the numbers
insert into numbers(i)
select i from numbers_2_to_the_power_25_minus_1
where i <= 20000000 ;

drop view if exists numbers_2_to_the_power_25_minus_1 ;
drop view if exists numbers_2_to_the_power_12_minus_1 ;
drop table if exists numbers_2_to_the_power_13_minus_1 ;
drop view if exists two_to_the_power_13_minus_1 ;
drop view if exists two_to_the_power_12_minus_1 ;
drop view if exists two_to_the_power_08_minus_1 ;
drop view if exists two_to_the_power_04_minus_1 ;
drop view if exists two_to_the_power_02_minus_1 ;
drop view if exists two_to_the_power_01_minus_1 ;
Keith John Hutchison
  • 4,955
  • 11
  • 46
  • 64
  • While this one is definitely complicated, it works flawless from phpmyadmin sql execution into mysql and is very fast for 20,000,000. Simply changing the `20000000` in the code to anything from 1 to 33,554,431 ([2^25]-1) will work. Probably fits most people needs. If you need more and don't understand the code I suggest you use the PHP with few queries, as its easiest to understand but still not slow – ParoX Jan 14 '13 at 05:14
  • Cheers @WalterSchrabmair! – Keith John Hutchison Sep 13 '19 at 00:05
2

If speed is a concern, you should use LOAD DATA INFILE which is faster than INSERT according to mysql doc :

http://dev.mysql.com/doc/refman/5.5/en/insert-speed.html

When loading a table from a text file, use LOAD DATA INFILE. This is usually 20 times
faster than using INSERT statements. See Section 13.2.6, “LOAD DATA INFILE Syntax”. 

Basically you generate a 20 million lines using your favorite language (php ?), then you load it with LOAD DATA INFILE.

http://dev.mysql.com/doc/refman/5.5/en/load-data.html

psadac
  • 2,312
  • 5
  • 32
  • 41
2

I typical way to create such a table is to start with:

select 0 as num union all select 1 union all select 2 union all
select 3 union all select 4 union all select 5 union all select 6 union all
select 7 union all select 8 union all select 9

Now, in most databases, you can use a with statement and do the following:

with digits as (above query)
select d1.num+10*d2.num+100*d3.num+1000*d4.num+10000*d5.num+100000*d6.num+1000000*d7.num+10000000*87.num as num
from   digits d1 cross join
       digits d2 cross join
       digits d3 cross join
       digits d4 cross join
       digits d5 cross join
       digits d6 cross join
       digits d7 cross join
       (select 0 as num union all select 1) d8

Unfortunately, in MySQL you either need to create a temporary table or repeat the union all statement:

select d1.num+10*d2.num+100*d3.num+1000*d4.num+10000*d5.num+100000*d6.num+1000000*d7.num+10000000*d7.num as num
from (select 0 as num union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9
     ) d1 cross join
     (select 0 as num union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9
     ) d2 cross join
     (select 0 as num union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9
     ) d3 cross join
     (select 0 as num union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9
     ) d4 cross join
     (select 0 as num union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9
     ) d5 cross join
     (select 0 as num union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9
     ) d6 cross join
     (select 0 as num union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9
     ) d7 cross join
     (select 0 as num union all select 1) d8

In MySQL, if you want to put this into a table, you can just use create table numbers as before the select. However, different databases have different syntax for dumping the results of a select into a table.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Can you modify your code so that it inserts all the numbers into a table titled `numbers` with column `i`. I am trying to make the solutions relatively universal – ParoX Jan 12 '13 at 22:55
  • You would just add `create table numbers as (num integer)` and then `insert into numbers `, if I understand correctly. – Gordon Linoff Jan 12 '13 at 23:12
  • OK got it, this gives me 20 million records of duplicates of 0-10 million. So: 0,0,1,1,2,2...9999999 – ParoX Jan 13 '13 at 00:14
  • @BHare . . .Check the numbers in the `select` clause. I must have left a 0 out somewhere. Or put in d6 twice. I fixed the statement. – Gordon Linoff Jan 13 '13 at 00:17
  • Hm. Still not right. I notice you never reference d8, but d7 twice. I changed the last d7 and d8 and it works but it goes: 0, 1000001, 1, 1000002, 2, 1000003... – ParoX Jan 13 '13 at 01:03
  • I was able to get it in order by attaching `ORDER BY num ASC` at the end but it takes 13 seconds over 6 seconds doing this. – ParoX Jan 13 '13 at 01:10
  • Is there a better way than just using the `ORDER BY` ? – ParoX Jan 14 '13 at 05:03
  • @BHare . . . SQL Tables are inherently unordered when they are stored. I think what you want is the `create table` statement, with the num column being the primary key on the table. – Gordon Linoff Jan 14 '13 at 14:06
0

I am not sure if you are trying to make one call make 20 million rows or call a row making call 20 million times. An example of the second case would be:

<?php
$i =0;
while($i <= 20000000){
$sql = mysql_query("INSERT INTO table_name VALUES ('$i')");
$i +=1;
}
?>

If you are looking for a SQL solution couldn't you try an adaption of

DROP TABLE NumbersTest
DECLARE @RunDate datetime
SET @RunDate=GETDATE()
SELECT TOP 20000000 IDENTITY(int,1,1) AS Number
    INTO NumbersTest
    FROM sys.objects s1
    CROSS JOIN sys.objects s2
ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number)
PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE()))+' milliseconds'
SELECT COUNT(*) FROM NumbersTest

Taken from this post and is reported to make 10,000 rows in an average of 56.3 milliseconds.

Community
  • 1
  • 1
Devon Bernard
  • 2,250
  • 5
  • 19
  • 32
  • Just as a side note; if in your database table you have one column with auto increment property you could just replace the '$i' in the INSERT query with '' – Devon Bernard Jan 12 '13 at 21:45
  • I really don't think using PHP for this is really the fastest solution, in fact I know it's not. Also, avoiding using the outdated `mysql_query` when giving answers. – ParoX Jan 12 '13 at 21:56
  • @BHare: Alright; sorry I misunderstood I thought you wanted it just SQL compatible not actually only in SQL. – Devon Bernard Jan 12 '13 at 22:00
  • Your solution fits the question, it just isn't the best way to approach this. It took 39 seconds to do 1 million rows, where as links I referenced do the same in less than 1 second. Though, most MySQL users do know PHP so this may pass as the most simple solution and the time it takes to write/think/analyze all the other solutions would probably exceed the time needed for this method. So actually, it's a good solution. – ParoX Jan 12 '13 at 22:11
  • @BHare: Ok, and I did not know the time factor was a huge issue I thought this was just a one time task not something that would be executed a bunch. But alright I will read over the other peoples documentation and if I find anything I will post an alternate solution to 20 million. – Devon Bernard Jan 12 '13 at 22:13
  • To be honest, I come up with my solution a few minutes ago. Now it's just documentation alternatives for future googlers who come across this. Hopefully I can save someone some time. – ParoX Jan 12 '13 at 22:18
  • I saw that post earlier, was looking for it but lost it. Most of the solutions are for SQL-server and wouldn't execute in MySQL – ParoX Jan 12 '13 at 22:20
0

Apologies if this answer is referenced already. This took 18.79 secs on my machine (a DELL laptop if it matters)...

It's adapted from an old solution at http://datacharmer.blogspot.co.uk/2006/06/filling-test-tables-quickly.html but, crucially, this won't work with the default 'vanilla' InnoDB engine, and it will be much slower if you try to establish the PK at the outset.

On the plus side you get an extra 13.5 M rows for free!

drop table if exists numbers;
create table numbers ( id int not null) engine = myisam;

delimiter $$

drop procedure if exists fill_numbers $$
create procedure fill_numbers()
deterministic
begin
  declare counter int default 1;
  insert into numbers values (1);
  while counter < 20000000
  do
      insert into numbers (id)
          select id + counter
          from numbers;
      select count(*) into counter from numbers;
      select counter;
  end while;
end $$
delimiter ;

call fill_numbers();
Strawberry
  • 33,750
  • 13
  • 40
  • 57
  • I get `Column count of mysql.proc is wrong. Expected 20, found 16. ` – ParoX Jan 13 '13 at 00:16
  • Sorry. I don't know what that could be about. Works fine for me. – Strawberry Jan 13 '13 at 00:18
  • I did a mysql_upgrade which seems to have fixed that issue, but now I get `#1312 - PROCEDURE test.fill_numbers can't return a result set in the given context` which seems to be related to a bug in PHPmyadmin: http://stackoverflow.com/questions/2360371/mysql-stored-procedures-not-working-with-select-basic-question – ParoX Jan 14 '13 at 05:02
  • Are you obliged to use PHPMyAdmin? Why not use Workbench, or just fire up the CLI ? – Strawberry Jan 14 '13 at 08:06
  • Yes. The server(s) are local and I run them headless, unless workbench has a web interface? CLI is too much hassle. – ParoX Jan 14 '13 at 17:33
0

Adpoting psadac's answer of using LOAD DATA INFILE and the idea of BULK insertion applied to fwrite:

$fh = fopen("data_num.txt", 'a') or die("can't open file");
$i =1;
while($i <= 20000000) {
    $num_string .= "$i\n";
    if($i % 1000000 == 0) {
        fwrite($fh, $num_string);
        $num_string = "";
    }
    $i +=1;
}
fclose($fh);

$dbh->beginTransaction();
$query = "LOAD DATA INFILE '" . addslashes(realpath("data_num.txt")) . "' INTO TABLE numbers LINES TERMINATED BY '\n';";
    $sth = $dbh->prepare($query);
    $sth->execute();
$dbh->commit();
unlink("data_num.txt");

I had to use addslashes as I am using windows enviorment.

It's interesting to note that doing the BULK technique by writing only 20 times to file over 20 million resulted in ~10 seconds compared to ~75 seconds by just writing 20 million times. Using string concatenation over pushing values into an array and imploding yielded almost twice as fast.

ParoX
  • 5,685
  • 23
  • 81
  • 152
0

A simpler & faster solution

(Original code here)

CREATE TABLE `numbers` (
  `i` INT(11) SIGNED 
  , PRIMARY KEY(`i`)
) ENGINE=myisam DEFAULT CHARSET=latin1;

INSERT INTO numbers(i) SELECT @row := @row + 1 FROM 
(SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) t,
(SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) t2, 
(SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) t3, 
(SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) t4, 
(SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) t5, 
(SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) t6, 
(SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) t7, 
(SELECT 0 UNION ALL SELECT 1) t8, 
(SELECT @row:=0) ti;

On my laptop with MySQL 5.5.29 installed, compare the accepted answer with this one:

+-----------------+-------+---------------+
| Method          | Rows  | Time consumed |
+-----------------+-------+---------------+
| Accepted answer | 20M+1 |         42.4s |
+-----------------+-------+---------------+
| This one        | 20M   |         35.9s |
+-----------------+-------+---------------+

About 15% less time, no intermediate views or tables, and easier to read.

Community
  • 1
  • 1
Rockallite
  • 16,437
  • 7
  • 54
  • 48
-1

In response to Devon Bernard's answer, I decided to approach it using PDO Mysql PHP and use the concept of just a few queries. At first I tried to do it with just 1 big query but PHP ran out of memory with default settings, so I decided to tweak to run every 100,000th. Even if you allocate enough memory to hold, there is no significant improvement.

$i = 1;
$inserts = array();
while($i <= 20000000) {
    $inserts[] = "($i)";

    if($i % 100000 == 0) {
        $dbh->beginTransaction();
        $query = "INSERT INTO numbers(i) VALUES " . implode(',', $inserts) . ";";
            $sth = $dbh->prepare($query);
            $sth->execute();
        $dbh->commit();
        $inserts = array();
    }
    $i +=1;
}
ParoX
  • 5,685
  • 23
  • 81
  • 152