I need to obfuscate a MySQL table's IDs using a MySQL-only solution (no application code like PHP), and generate UNIQUE values.
Here's what I have managed to get working (using a MySQL trigger to update the row that was inserted with a comment_hash value. The problem is I get a "Duplicate Entry" error. I know hashes are not unique but how can I map a UNIQUE _comments.id value to a UNIQUE 4-character hash?)
DROP TABLE IF EXISTS _comments;
CREATE TABLE IF NOT EXISTS _comments (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`comment_hash` VARCHAR(255) NOT NULL,
`user_id` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `comment_user_hash_parentid_unique` (`user_id`,`comment_hash`)
) ENGINE=INNODB;
DROP TRIGGER IF EXISTS `test_add_comment_hash_on_insert_into_comments_table`;
DELIMITER //
CREATE TRIGGER test_add_comment_hash_on_insert_into_comments_table
BEFORE INSERT ON _comments
FOR EACH ROW
BEGIN
SET NEW.comment_hash = right(md5(LAST_INSERT_ID()+1),4);
END
//
DELIMITER ;
And then in my seeder (I am using Laravel for the convenience of multiple inserts in a PHP for loop), but you can use anything for fast inserts.
<?php
use Illuminate\Database\Seeder;
use Faker\Generator as Faker;
// Call using command: php artisan db:seed --class=CommentsFakeDataSeeder
class WtfFakeDataSeeder extends Seeder
{
/**
* Seed the application's database.
*
* @return void
*/
public function run(Faker $faker)
{
echo "Seeding _comments... \n";
DB::statement('SET FOREIGN_KEY_CHECKS=0;');
$fake_comments = [];
for ($i = 0; $i < 10000; $i++) {
$fake_comments[] = [
'user_id' => 1,
];
}
DB::table('_comments')->insert($fake_comments);
DB::statement('SET FOREIGN_KEY_CHECKS=1;');
}
}
I invoke the seeder with:
php artisan db:seed --class=WtfFakeDataSeeder
But I always end up with an integrity constraint violation
Doctrine\DBAL\Driver\PDOException::("SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry '1-849b' for key 'comment_user_hash_parentid_unique'")
With lowercase-only alphanumeric characters allowed in the 4-character comment_hash, we should have 36^4 = 1679616 values. Yet, collisions occur with < 10K inserts as you can see below. md5() isn't the right solution here..do you know one that is? id's are always unique...whether or not I combine them with userID (to introduce more entropy), how can I map them to UNIQUE 4-character hash values?