1

I have a set of numbers from MySQL within the range 1000 0000 (8 digits) to 9 999 999 999 (10 digits). It's supposed to be consecutive, but there are missing numbers. I need to know which numbers are missing.

The range is huge. At first I was going to use PHP to do this:

//MySqli Select Query
$results = $mysqli->query("SELECT `OCLC Number` FROM `MARC Records by Number`");

$n_array = array();
while($row = $results->fetch_assoc()) {
    $n_array[] = $row["OCLC Number"];
}

d($n_array);
foreach($n_array as $k => $val) {
    print $val . " ";
}

/* 8 digits */
$counter = 10000000;
$master_array = array();

/* 10 digits */
while ($counter <= 9999999999 ) {
    $master_array[] = $counter;
    $counter++;
    d($master_array);
}
d($master_array);    

$missing_numbers_ar = array_diff ($master_array, $n_array);
d($missing_numbers_ar);

d() is a custom function akin to var_dump().

However, I just realized it would take tons of time for this to be done. At the 15 minute mark, $master_array is being populated with only 4000 numbers.

How can I do this in a quicker way? MySQL-only or MySQL-and-PHP solutions both welcome. If the optimal solution depends on how many numbers are missing, please let me know how so. Tq.

Manoj Sharma
  • 1,467
  • 2
  • 13
  • 20
forgodsakehold
  • 870
  • 10
  • 26
  • 2
    dupe: http://stackoverflow.com/questions/4340793/how-to-find-gaps-in-sequential-numbering-in-mysql –  Jan 18 '17 at 02:36
  • It seems unlikely that you have 10 billion records in your table. Why not just dump out all the IDs in your database into a flat file, sort it, and see what gaps there are by iterating over line by line in a text file? That usually runs really fast. – tadman Jan 18 '17 at 04:10
  • Possible duplicate of [How to find gaps in sequential numbering in mysql?](http://stackoverflow.com/questions/4340793/how-to-find-gaps-in-sequential-numbering-in-mysql) – imel96 Jan 18 '17 at 06:27

2 Answers2

1

Your d() probably is the cause of slowness, please remove it, and make small changes in your code

while($row = $results->fetch_assoc()) {
    $n_array[$row["OCLC Number"]] = 1;
}

and

$missing_numbers_ar = [];

while ($counter++ <= 9999999999 ) {
    if (empty($n_array[$counter])) {
        $missing_numbers_ar[] = $counter;        
    }
}
Hieu Vo
  • 3,105
  • 30
  • 31
0

If the following is still slow I would be surprised. I also just noticed it is similar to @Hieu Vo's answer.

// Make sure the data is returned in order by adding
// an `ORDER BY ...` clause.
$results = $mysqli->query("SELECT `OCLC Number` 
            FROM `MARC Records by Number` 
            ORDER BY `OCLC Number`");

$n_array = array();
while($row = $results->fetch_assoc()) {
    // Add the "OCLC Number" as a key to the array.
    $n_array[$row["OCLC Number"]] = $row["OCLC Number"];
}

// assume the first array key is in fact correct
$i = key($n_array);
// get the last key, also assume it is not missing.
end($n_array);
$max = key($n_array);
// reset the array (should not be needed)
reset($n_array);
do {
    if (! $n_array[$i]) {
        echo 'Missing key:['.$i.']<br />';
        // flush the data to the page as you go.
        flush();
    }
} while(++$i <= $max);
Tigger
  • 8,980
  • 5
  • 36
  • 40