7

I have 4 MySQL tables with depending on each other via FOREIGN KEYs.

Please checkout this below schema for tables structure:

CREATE DATABASE IF NOT EXISTS courses
    CHARACTER SET latin1
    COLLATE latin1_bin;

CREATE TABLE IF NOT EXISTS courses.institution
(
    icode INT UNSIGNED NOT NULL AUTO_INCREMENT,
    iname VARCHAR(255) NOT NULL,
    PRIMARY KEY (icode),
    UNIQUE (iname)
)
    ENGINE = InnoDB;

CREATE TABLE IF NOT EXISTS courses.cities
(
    ccode INT UNSIGNED NOT NULL AUTO_INCREMENT,
    cname VARCHAR(255) NOT NULL,
    PRIMARY KEY (ccode),
    UNIQUE (cname)
)
    ENGINE = InnoDB;

CREATE TABLE IF NOT EXISTS courses.skills
(
    scode INT UNSIGNED NOT NULL AUTO_INCREMENT,
    sname VARCHAR(255) NOT NULL,
    PRIMARY KEY (scode),
    UNIQUE (sname)
)
    ENGINE = InnoDB;

CREATE TABLE IF NOT EXISTS courses.relation
(
    icode INT UNSIGNED NOT NULL,
    scode INT UNSIGNED NOT NULL,
    ccode INT UNSIGNED NOT NULL,
    UNIQUE KEY ucols (icode, scode, ccode),
    FOREIGN KEY (icode) REFERENCES courses.institution (icode),
    FOREIGN KEY (scode) REFERENCES courses.skills (scode),
    FOREIGN KEY (ccode) REFERENCES courses.cities (ccode)
)
    ENGINE = InnoDB;

Currently I'm executing this below queries to insert just one record in relation table.

It costs 4 INSERT queries and 3 SELECT subqueries for just one insert on each time.

INSERT IGNORE INTO institution(iname) VALUES ('ABC Learners');
INSERT IGNORE INTO skills(sname) VALUES ('PHP');
INSERT IGNORE INTO cities(cname) VALUES ('Bangalore');

INSERT IGNORE INTO relation (icode, scode, ccode) VALUES (
    (SELECT icode FROM institution WHERE iname = 'ABC Learners'),
    (SELECT scode FROM skills WHERE sname = 'PHP'),
    (SELECT ccode FROM cities WHERE cname = 'Bangalore')
);

Is it necessary to execute all this queries every time? or is there any better way to perform this action in single or few queries?

Checkout this below simple PHP code. In this code for just inserting 7 records on relation table it executes 4 INSERT queries with 3 SELECT subqueries on each record.

It costs total 7 * 7 = 49 queries for 7 records. How to solve this?

<?php

$db = new mysqli('localhost', 'user', '****', 'courses');

$records = [
    ['ABC Learners', 'CSS', 'Bangalore'],
    ['ABC Learners', 'PHP', 'Bangalore'],
    ['ABC Learners', 'HTML', 'Bangalore'],
    ['ABC Learners', 'PHP', 'Hyderabad'],
    ['XYZ Solutions', 'PHP', 'Hyderabad'],
    ['XYZ Solutions', 'JAVA', 'Hyderabad'],
    ['XYZ Solutions', 'JAVA', 'Bangalore'],
];

foreach ($records as $record) {

    list($institute, $skill, $city) = $record;

    $db->query("INSERT IGNORE INTO institution (iname) VALUES ('{$institute}')");
    $db->query("INSERT IGNORE INTO skills (sname) VALUES ('{$skill}')");
    $db->query("INSERT IGNORE INTO cities (cname) VALUES ('{$city}')");

    $db->query(
        "INSERT IGNORE INTO relation (icode, scode, ccode) VALUES (" .
        "(SELECT icode FROM institution WHERE iname = '{$institute}'), " .
        "(SELECT scode FROM skills WHERE sname = '{$skill}'), " .
        "(SELECT ccode FROM cities WHERE cname = '{$city}'))"
    );
}

$db->close();

NOTE: Above script is example purpose. Using batch mode or disabling auto commit is not useful for me. Because many times I need to add single new record to relation table (based on user request via web panel)


UPDATE 1:

After some research and benchmarks I have created a MySQL stored function to speed up this process and it increased performance by 250 - 300%

Please checkout function here:

DELIMITER $$
CREATE FUNCTION courses.record(i_name VARCHAR(255), s_name VARCHAR(255), c_name VARCHAR(255)) RETURNS INT
BEGIN
    DECLARE _icode, _scode, _ccode INT UNSIGNED;

    SELECT icode INTO _icode FROM institution WHERE iname = i_name;
    SELECT scode INTO _scode FROM skills WHERE sname = s_name;
    SELECT ccode INTO _ccode FROM cities WHERE cname = c_name;

    IF _icode IS NULL THEN
        INSERT IGNORE INTO institution (iname) VALUES (i_name);
        SELECT icode INTO _icode FROM institution WHERE iname = i_name;
    END IF;

    IF _scode IS NULL THEN
        INSERT IGNORE INTO skills (sname) VALUES (s_name);
        SELECT scode INTO _scode FROM skills WHERE sname = s_name;
    END IF;

    IF _ccode IS NULL THEN
        INSERT IGNORE INTO cities (cname) VALUES (c_name);
        SELECT ccode INTO _ccode FROM cities WHERE cname = c_name;
    END IF;

    INSERT IGNORE INTO relation (icode, scode, ccode) VALUES (_icode, _scode, _ccode);

    RETURN ROW_COUNT();
END $$
DELIMITER ;

Now, this below PHP script can insert one record in relation table with just one query

<?php

$db = new mysqli('localhost', 'user', '***', 'courses');

$records = [
    ['ABC Learners', 'CSS', 'Bangalore'],
    ['ABC Learners', 'PHP', 'Bangalore'],
    ['ABC Learners', 'HTML', 'Bangalore'],
    ['ABC Learners', 'PHP', 'Hyderabad'],
    ['XYZ Solutions', 'PHP', 'Hyderabad'],
    ['XYZ Solutions', 'JAVA', 'Hyderabad'],
    ['XYZ Solutions', 'JAVA', 'Bangalore'],
];

$query = $db->prepare("SELECT record (?, ?, ?)");
$query->bind_param('sss', $institute, $skill, $city);

foreach ($records as $record) {
    list($institute, $skill, $city) = $record;
    $query->execute();
}

$db->close();

This MySQL stored function increased the performance. But, still I'm using multiple INSERT and SELECT statements in this function. Is it possible to optimize this function with few statements to gain more performance?

Temüjin
  • 15,371
  • 8
  • 35
  • 57
  • An insert can only insert to 1 table at a time. Your last insert doesn't need the selects , why you didn't use insert..values as you did on the other inserts? – P.Salmon Aug 20 '19 at 08:05
  • 1
    @P.Salmon because the values needed for the last insert are `auto_increment` ids, so the OP doesn't know them before successfully inserted the 3 others – Kaddath Aug 20 '19 at 08:43
  • One suggestion: in `relation` table, instead of defining combination `(icode, scode, ccode)` as `UNIQUE` key, you can rather define it as a `PRIMARY` key instead. – Madhur Bhaiya Aug 22 '19 at 12:45
  • @ChandraNakka Now you are executing one prepared statement multiple times in a loop, exactly as I proposed in my answer. But why didn't you use a transaction? – Paul Spiegel Aug 26 '19 at 12:15

4 Answers4

6

1. Change UNIQUE to PRIMARY key:

In absence of an explicit Primary Key defined, InnoDB will create its own hidden primary key for every row. But in your case, in the table relation, the current UNIQUE key is a better candidate to be defined as PRIMARY key instead (NOT NULL and Uniqueness satisfied). So change that to PRIMARY KEY instead.

2. Utilize Batch Inserts/Selects:

We can basically break this into 4 Inserts and 3 Selects overall, by using batch insert and select, and utilizing some application catching. Check the code below (with comments):

$records = [
    ['ABC Learners', 'CSS', 'Bangalore'],
    ['ABC Learners', 'PHP', 'Bangalore'],
    ['ABC Learners', 'HTML', 'Bangalore'],
    ['ABC Learners', 'PHP', 'Hyderabad'],
    ['XYZ Solutions', 'PHP', 'Hyderabad'],
    ['XYZ Solutions', 'JAVA', 'Hyderabad'],
    ['XYZ Solutions', 'JAVA', 'Bangalore'],
];

// Create a copy of records to avoid changing the original
$records_copy = $records;

// Get unique institutions, skills and cities
$i = array_unique(array_map('array_shift', $records_copy)); 
$s = array_unique(array_map('array_shift', $records_copy)); 
$c = array_unique(array_map('array_shift', $records_copy)); 

// Prepare batch insert and select SQL for institution table
$i_ins_sql = "INSERT IGNORE INTO institution (iname) VALUES ";
$i_sel_sql = "SELECT icode, iname FROM institution WHERE iname IN (";

foreach ($i as $v) {

    $i_ins_sql .= "('" . $db->real_escape_string($v) . "'),";
    $i_sel_sql .= "'" . $db->real_escape_string($v) . "',";
}

// Execute the INSERT query
$db->query( rtrim($i_ins_sql, ',') );

// Execute the SELECT query and fetch the query result and store (cache) it
// Key of the cache array would be name (string) and the value would be 
// the code (integer)
$i_cache = array();
$i_sel_q = $db->query( rtrim($i_sel_sql, ',') . ")" );
while ( $row = $i_sel_q->fetch_assoc() ) {
    $i_cache[$row['iname']] = $row['icode'];
}

/**
 * REPEAT the INSERT and SELECT steps for the skills and cities tables, 
 * using $s and $c arrays respectively, with appropriate field names
 * Eventually, you would have cache arrays, $s_cache and $c_cache.
 **/

// Now let's do a batch INSERT INTO relation table 
$sql = "INSERT IGNORE INTO relation (icode, scode, ccode) VALUES ";

// Loop over original records array
foreach ($records as $record) {

    $sql .= "(" . (int)$i_cache[$record[0]] . ","
                . (int)$i_cache[$record[1]] . "," 
                . (int)$i_cache[$record[2]] . "),";
}

// Execute the Batch INSERT query into relation table
$db->query( rtrim($sql, ',') );

Last, but not least: Your code is open to SQL injection related attacks. Even real_escape_string cannot secure it completely. Please learn to use Prepared Statements instead.

Temüjin
  • 15,371
  • 8
  • 35
  • 57
Madhur Bhaiya
  • 28,155
  • 10
  • 49
  • 57
4

Two words of warning:

  • INSERT IGNORE (and IODKU and virtually any other technique) will "burn" ids. That is, it preallocates the AUTO_INCREMENT values and they are lost.
  • Be careful about what is, versus is not, inside the main transaction. There is no need to rollback a normalization on failure.

When there are a lot of rows to insert, it is most efficient to do them in a batch.

  • Build a temp table with the names, plus columns for the ids.
  • Normalize one id-name pair (no transaction):
    • INSERT ... SELECT ... LEFT JOIN ... -- insert only the new ones.
    • UPDATE temp_table SET id = ... -- (from normalization table) to get all the ids.
  • (repeat for each normalization table)
  • INSERT INTO real_table SELECT id, ... FROM temp_table

Details: http://mysql.rjweb.org/doc.php/staging_table#normalization

(This is perhaps a fixed 9 queries, not 7*7 as in your example.)

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • The first warning gives very valuable information. Thanks a lot. – Temüjin Aug 23 '19 at 07:54
  • This is also not provided solution for my question. Still, I'm not received any valuable solution from other answers on this page. But, this answer gives very valuable information if compare to other answers. So, I want to allocate bounty reward to this. – Temüjin Aug 30 '19 at 05:16
  • @ChandraNakka - Yes, it probably takes 2 SQLs. However, if the data can be batched, the link provided shows how to do the task in 2 SQL statements _per batch_ instead of 2 _per row_. – Rick James Aug 30 '19 at 05:26
3

You should use prepared statements with placeholders to prevent SQL injections. And you should use a transaction for data consistency.

The good news: Executing prepared statemnts multiple times in a single transaction can also improve the performance. So I would try the following:

  1. Prepare all four INSERT statements
  2. Bind the parameters
  3. Start the transaction
  4. Execute the statements in a loop once for each institute-skill-city combination
  5. Commit the transaction
$db = new mysqli('localhost', 'user', '****', 'courses');

$records = [
    ['ABC Learners', 'CSS', 'Bangalore'],
    ['ABC Learners', 'PHP', 'Bangalore'],
    ['ABC Learners', 'HTML', 'Bangalore'],
    ['ABC Learners', 'PHP', 'Hyderabad'],
    ['XYZ Solutions', 'PHP', 'Hyderabad'],
    ['XYZ Solutions', 'JAVA', 'Hyderabad'],
    ['XYZ Solutions', 'JAVA', 'Bangalore'],
];

$stmtInstitution = $db->prepare("INSERT IGNORE INTO institution (iname) VALUES (?)");
$stmtSkills = $db->prepare("INSERT IGNORE INTO skills (sname) VALUES (?)");
$stmtCities = $db->prepare("INSERT IGNORE INTO cities (cname) VALUES (?)");
$stmtRelation = $db->prepare("
    INSERT IGNORE INTO relation (icode, scode, ccode) VALUES (
      (SELECT icode FROM institution WHERE iname = ?),
      (SELECT scode FROM skills      WHERE sname = ?),
      (SELECT ccode FROM cities      WHERE cname = ?)
    )
");

$stmtInstitution->bind_param('s', $institute);
$stmtSkills->bind_param('s', $skill);
$stmtCities->bind_param('s', $city);
$stmtRelation->bind_param('sss', $institute, $skill, $city);

$db->begin_transaction();

foreach ($records as $record) {
    list($institute, $skill, $city) = $record;

    $stmtInstitution->execute();
    $stmtSkills->execute();
    $stmtCities->execute();
    $stmtRelation->execute();
}

$db->commit();
$db->close();

This might be not as fast as bulk inserts. But it should be much faster than your original solution. And it is not too complicated.

Update

Regarding performance: See Bill Karwin's summary (slide 50) of Load Data Fast! and compare "multi stmt per txn" (as in this answer), "multi row per stmt" (as you originally asked for) and "one row per stmt/txn" (as in your original code).

Paul Spiegel
  • 30,925
  • 5
  • 44
  • 53
0

It's overcome if you already have Id's of foreign keys while inserting, In your client-side you need to display a list of institution and cities and skills, Which all are contains id, name and other related pieces of information, So when user select particular institution,cities & skills You need to add respective Id's into your request data. Ex

const { institutionId, skillId, cityId } = req.body;

So you need to just perform only one insert query;

INSERT IGNORE INTO relation (icode, scode, ccode) VALUES ( 
institutionId, skillId, cityId);

Which is much more faster.

Neel Rathod
  • 2,013
  • 12
  • 28
  • 1
    This is usually unsafe if the server can serve multiple clients: you're not guaranteed to have the right id client-side if another client is doing an insert quite at the same time and the first one doesn't have the updated list yet.. (note that you are answering in Javascript to a PHP question) – Kaddath Aug 20 '19 at 08:46
  • In that case, we have user data ( `req.user` ) to identify which use initiate that request, and obviously, there is a validation to check that id exists or not in DB. – Neel Rathod Aug 20 '19 at 09:31