3

I'm using PHP 5.5.9 and MySQL 5.5.44 with mysqlnd 5.0.11-dev on Ubuntu 14.04 LTS. The following statement fails to prepare:

$db->prepare("SELECT nr.x FROM (SELECT ? AS x) AS nr")

This is despite the fact that the following statement prepares successfully:

$db->prepare("SELECT nr.x FROM (SELECT '1337' AS x) AS nr")

What causes this difference? The manual says "Parameter markers can be used only where data values should appear, not for SQL keywords, identifiers, and so forth." But this is for a data value.

Not PDO's fault

The same thing happens in the stand-alone client:

mysql -uredacted -predacted redacted
-- Type 'help;' or '\h' for help.
SELECT nr.x FROM (SELECT '1337' AS x) AS nr;
-- x
-- 1337
-- 1 row in set (0.00 sec)
PREPARE workingstmt FROM 'SELECT nr.x FROM (SELECT ''1337'' AS x) AS nr';
-- Query OK, 0 rows affected (0.00 sec)
-- Statement prepared
DEALLOCATE PREPARE workingstmt;
-- Query OK, 0 rows affected (0.00 sec)
PREPARE brokenstmt FROM 'SELECT nr.x FROM (SELECT ? AS x) AS nr';
-- ERROR 1054 (42S22): Unknown column 'nr.x' in 'field list'
^D
-- Bye

My motivation

I'm trying to add a row to a table that has an auto-incrementing primary key. In InnoDB's default auto-increment locking mode, which the manual calls "consecutive", InnoDB skips an auto-increment value when a row might be inserted but is not, as is the case with INSERT IGNORE or ON DUPLICATE KEY UPDATE that runs into an existing row whose UNIQUE values match those of the row being inserted. (These are called "mixed-mode inserts" in the manual.)

Every few hours, I import a feed from my supplier. This has about 200,000 rows, and all but on average 200 of these rows have unique values that correspond to values already present in the table. So if I were to use INSERT IGNORE or ON DUPLICATE KEY UPDATE all the time, I'd burn through 199,800 IDs every few hours. So I don't want to use INSERT IGNORE or ON DUPLICATE KEY UPDATE for fear that I might exhaust the 4.2 billion limit of INTEGER UNSIGNED with repeated inserts over time to a table with the same UNIQUE key. I don't want to switch the column to BIGINT type because 32-bit PHP has no type with the same semantics as MySQL BIGINT. The server administrator is unwilling to switch to 64-bit PHP or to change innodb_autoinc_lock_mode for all users of the server.

So instead, I decided to try INSERT INTO ... SELECT, creating a 1-row table with the unique key columns in a subquery and left joining it to the main table to reject unique key values that already exist. (The manual says INSERT INTO ... SELECT is a "bulk insert", which does not burn IDs.) The intent is to do something like this:

INSERT INTO the_table
(uniquecol, othercol1, othercol2)
SELECT nr.uniquecol, :o1 AS othercol1, :o2 AS othercol2
FROM (
  SELECT ? AS uniquecol
) AS nr
LEFT JOIN the_table ON nr.settlement_id = the_table.settlement_id
WHERE the_table.row_id IS NULL

This failed, giving the PDO error: ["42S22",1054,"Unknown column 'settlement_id' in 'field list'"]

<?php // MCVE follows

/* Connect to database */
$pdo_dsn = 'mysql:host=127.0.0.1;dbname=redacted';
$pdo_username = 'redacted';
$pdo_password = 'redacted';
$pdo_options = [PDO::MYSQL_ATTR_INIT_COMMAND => 'SET NAMES utf8',];
$db = new PDO($pdo_dsn, $pdo_username, $pdo_password, $pdo_options);
$pdo_dsn = $pdo_username = $pdo_password = 'try harder';

// ensure that PDO doesn't convert everything to strings
// per http://stackoverflow.com/a/15592818/2738262
$db->setAttribute(PDO::ATTR_EMULATE_PREPARES, false);
$db->setAttribute(PDO::ATTR_STRINGIFY_FETCHES, false);

/* Create mock data with which to test the statements */
$prep_stmts = ["
CREATE TEMPORARY TABLE sotemp (
  file_id INTEGER UNSIGNED PRIMARY KEY AUTO_INCREMENT,
  settlement_id VARCHAR(30) NOT NULL,
  num_lines INTEGER UNSIGNED NOT NULL DEFAULT 0,
  UNIQUE (settlement_id)
)
","
INSERT INTO sotemp (settlement_id, num_lines) VALUES
('15A1', 150),
('15A2', 273),
('15A3', 201)
"];
foreach ($prep_stmts as $stmt) $db->exec($stmt);

/* Now the tests */

$working_stmt = $db->prepare("
SELECT nr.settlement_id
FROM (
  -- change this to either a value in sotemp or one not in sotemp
  -- and re-run the test program
  SELECT '15A3' AS settlement_id
) AS nr
LEFT JOIN sotemp ON nr.settlement_id = sotemp.settlement_id
WHERE sotemp.file_id IS NULL
");
if ($working_stmt) {
  $working_stmt->execute();
  $data = $working_stmt->fetchAll(PDO::FETCH_ASSOC);
  echo "Working: ".json_encode($data)."\n";
} else {
  echo "Working statement failed: ".json_encode($db->errorInfo())."\n";
}

$broken_stmt = $db->prepare("
SELECT nr.settlement_id
FROM (
  SELECT ? AS settlement_id
) AS nr
LEFT JOIN sotemp ON nr.settlement_id = sotemp.settlement_id
WHERE sotemp.file_id IS NULL
");
if ($broken_stmt) {
  $broken_stmt->execute(['15A4']);
  $data = $broken_stmt->fetchAll(PDO::FETCH_ASSOC);
  echo "Broken: ".json_encode($data)."\n";
} else {
  echo "Broken statement failed: ".json_encode($db->errorInfo())."\n";
}

What is causing this error? And is there a better way to insert a row only if the primary key does not exist without exhausting auto-increment IDs?

Damian Yerrick
  • 4,602
  • 2
  • 26
  • 64
  • 1
    What are you doing that you think would exhaust the 4.2b integers? You could insert a million rows a day and still it would take over 11 years to run out of numbers. If you are so worried about running out of numbers though, then look into using a GUID/UUID for the primary key. The primary key doesn't have to be an integer, that is just most common. – Jonathan Kuhn Aug 20 '15 at 17:44
  • I do agree with @JonathanKuhn I do use UUID() as primary key. – Maytham Fahmi Aug 20 '15 at 17:53
  • Don't over-engineer, it makes things harder to maintain. You'll never run out of INT, so use `INSERT IGNORE` or `ON DUPLICATE KEY UPDATE` and move on! If applicable, you could also do a `SELECT` before the insert, to check if the row exists. @JonathanKuhn 's comment is also another good alternative. – rlanvin Aug 20 '15 at 17:58
  • Also **IF** you ever run out of INT, the server administrator will be forced to migrate to 64bit architecture. So don't worry. – rlanvin Aug 20 '15 at 17:59
  • Another thought is that large datasets are not new. This is an issue that has been thought of before. Even if you expect to have over 4b rows, you would be better off trying to find an existing pattern/solution than to come up with some clever way yourself, even if just for maintainability. And while 32bit php doesn't support over 32bit integers, php, being loosely typed, will not overflow and convert the number to a string and there are libraries to handle math with pretty much infinite bit length. – Jonathan Kuhn Aug 20 '15 at 18:01
  • The feed I'm importing has only 200,000 rows, and I get one every few hours. Typically, about 199,800 of them have the same values in the unique columns. If I were to upsert them all with `ODKU`, I'd end up burning 199,800 IDs every few hours. – Damian Yerrick Aug 20 '15 at 18:06
  • The fact remains that `$db->prepare("SELECT nr.x FROM (SELECT ? AS x) AS nr")` fails. – Damian Yerrick Aug 20 '15 at 18:15
  • @JonathanKuhn Use strings and forget about sorting them by even rough order of insertion, as "4444444444" is greater than "10000000000000", and UUIDs are unordered. Use libraries, and once all your ID comparisons are running through GMP, forget about finishing the script in a reasonable time. – Damian Yerrick Aug 20 '15 at 18:56
  • @rlanvin Migrate to 64-bit and forfeit the remainder of the server lease. – Damian Yerrick Aug 20 '15 at 18:56
  • if I had to guess it would be that prepare thinks you are trying to bind a column name and you can't bind column names, only values. so the column `'settlement_id'` (with quotes making it a string) is not found and causing an issue. I'm not even 100% sure if you can bind a value in the field list. Have you tried just hard coding the id to see if it would work? – Jonathan Kuhn Aug 20 '15 at 19:02
  • @JonathanKuhn It works when hardcoded but not when prepared. Please compare the working statement to the broken statement in the MCVE. – Damian Yerrick Aug 20 '15 at 19:07
  • @tepples Even 200k rows every few hours you still need a couple of years to reach the limit, your server lease will surely be over by then, be realistic. But alright, what about selecting before inserting then? That's the simplest solution, it'll work wonders. Else if your feed ordered somehow, what about processing only the rows having an ID > last inserted ID from this cron? – rlanvin Aug 20 '15 at 19:19

1 Answers1

5

Your latest edit made the question very clear, so I'll attempt an answer: the cause of this difference is the placeholder.

As documented here, placeholders can only be used in certain places in the query. In particular:

Parameter markers can be used only where data values should appear, not for SQL keywords, identifiers, and so forth.

Now you might have noticed that SELECT ? as x prepares fine, but not SELECT nr.x FROM (SELECT ? AS x) AS nr. Why is that? Well this is best explained by an anonymous author on PHP's doc, so let me copy/paste:

There is a common misconception about how the placeholders in prepared statements work: they are not simply substituted in as (escaped) strings, and the resulting SQL executed. Instead, a DBMS asked to "prepare" a statement comes up with a complete query plan for how it would execute that query, including which tables and indexes it would use, which will be the same regardless of how you fill in the placeholders.

So simply put: because you are using a placeholder in a subquery in the FROM clause, MySQL cannot calculate the execution plan of the query.

In other words, since your query will always change, there is not "template" that can be prepared for it.

Therefore if you really want to use this query, you need to use a normal (non-prepared) query, or turn back on PDO's emulated prepared statements.

That being said, please, do consider the various alternatives offered in the comments section. There are much better solutions for what you are trying to achieve.

rlanvin
  • 6,057
  • 2
  • 18
  • 24