-1

I have a simple article and tag_map tables as

CREATE TABLE Articles
(
  ArticleID int(11) unsigned NOT NULL AUTO_INCREMENT,
  Title varchar(255),
  PRIMARY KEY(ArticleID)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE utf8_general_ci

CREATE TABLE Tags
(
  TagID int(11) unsigned NOT NULL AUTO_INCREMENT,
  Tag varchar(255),
  UNIQUE INDEX(Tag),
  PRIMARY KEY(TagID)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE utf8_general_ci

CREATE TABLE TagMap
(
  ArticleID int(11) unsigned NOT NULL,
  TagID int(11) unsigned NOT NULL,
  INDEX(TagID),
  PRIMARY KEY(ArticleID,TagID)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE utf8_general_ci

I add tags via PHP

$result = $mysqli->query("SELECT TagID FROM Tags WHERE Tag='$tag'");

if($result->num_rows == 1) {
    $row = $result->fetch_assoc();
    $tag_id = $row['TagID'];
}
else {
    $mysqli->query("INSERT INTO Tags (Tag) VALUES ('$tag')");
    $tag_id = $mysqli->insert_id;
}

$mysqli->query("INSERT INTO TagMap (ArticleID,TagID) VALUES ($article_id,$tag_id)");

I wonder if there is a faster way to do this in one query within MySQL.

Here, I need 2 or 3 queries for adding each tag.

Additionally, I hope to find a way for batch INSERT (possibly via LOAD DATA LOCAL INFILE) when we have a list of tags as

ArticleID,Tag
1,tag2
2,tag11
4,tag3
ADyson
  • 57,178
  • 14
  • 51
  • 63
Googlebot
  • 15,159
  • 44
  • 133
  • 229
  • Yes, batch inserting from a flat file might be faster. Do you need help with that? – Tim Biegeleisen Jun 10 '21 at 12:06
  • 1
    **Warning:** You are wide open to [SQL Injections](https://php.net/manual/en/security.database.sql-injection.php) and should use parameterized **prepared statements** instead of manually building your queries. They are provided by [PDO](https://php.net/manual/pdo.prepared-statements.php) or by [MySQLi](https://php.net/manual/mysqli.quickstart.prepared-statements.php). Never trust any kind of input! Even when your queries are executed only by trusted users, [you are still in risk of corrupting your data](http://bobby-tables.com/). [Escaping is not enough!](https://stackoverflow.com/q/5741187) – Dharman Jun 10 '21 at 12:12
  • LOAD DATA cannot insert when lookup is needed. I recommend you to create stored procedure which will load your data file into temporary table and then insert into TagMap table. From PHP side you'll need to call this SP only (and, maybe, provide the filename into). – Akina Jun 10 '21 at 12:13
  • I assume `INSERT INTO Tags (ArticleID,TagID)` is meant to be `INSERT INTO TagMap (ArticleID,TagID)`? Also why do you need to do `INSERT INTO Tags (Tag) ` when you've just selected from it? And where is `$article_id` coming from, in that code? In short, the example code doesn't make much sense. – ADyson Jun 10 '21 at 12:14
  • @ADyson it was typo. `SELECT` is for when the tag exists. Look at `if($result->num_rows==1)` – Googlebot Jun 10 '21 at 12:27
  • Ah ok. The poor formatting of your code made it hard to see that. I'll fix it – ADyson Jun 10 '21 at 12:29
  • So where does $article_id come from then? You didn't clarify that. And $tag, for that matter. Is it user input? Or you're reading from a file? – ADyson Jun 10 '21 at 12:31
  • @ADyson `article_id` is provided with each tag, like the example data given at the end of the question. The PHP code is inside a loop. – Googlebot Jun 10 '21 at 12:34
  • I see, thanks. In that case I think Akina's answer below is on the right lines. – ADyson Jun 10 '21 at 12:35
  • @Dharman I disagree that without prepared statements, one is wide open to SQL injections. For prepared statements, you need to define the type of the field. Almost all examples of SQL injections are about using `real_escape_string` or `charset` incorrectly. `real_escape_string` is for strings only; if using the correct method for the correct type; `(int)`, `(float)`, etc, the risk of SQL injections are minimum. I don't say it is bullet-proof, but it is fairly safe. – Googlebot Jun 10 '21 at 15:30
  • @Googlebot IMHO "fairly safe" isn't really good enough. Anyway your code is inserting strings and isn't even escaping them – ADyson Jun 10 '21 at 15:42
  • @ADyson This is obviously part of the code. You don't see where `$tag` comes from; because it was irrelevant to my question. `$tag`. Not escaping was not a matter of SQL injection, the code simply would not work for tags containing `'`. – Googlebot Jun 10 '21 at 15:52
  • @Googlebot Sure, but why would you even bother with `real_escape_string` when you have prepared statements. They are easier and safer. It's the best option all around. – Dharman Jun 10 '21 at 16:04
  • @Dharman definitely, prepared statements are the safe and standard way for a production project. I just mean SQL injection is almost always the result of neglection in coding rather than the method used. I wrote the above code as simple as possible to make my question clear for the readers. I wrote it here, not even copy/paste from my actual code (as I had to delete lots of irrelevant parts). – Googlebot Jun 10 '21 at 16:12
  • Next time if you want to make as simple as possible for the readers then please use parameter binding. We have no idea why you are injecting PHP variables into the SQL or if you are type casting/escaping them when doing so. – Dharman Jun 10 '21 at 16:13
  • P.S. Whenever I see SQL injection, use of the deprecated `utf8` charset and use of mysqli API I downvote a post. Such posts are not very useful to future people as they are spreading unrecommended practices. If you want to make a question useful, use best practices. – Dharman Jun 10 '21 at 16:15
  • @Dharman please keep in mind that my question was not even about PHP. I showed a PHP code and explained I want to do this within MySQL by reading from a file data (as the accepted answer does). – Googlebot Jun 10 '21 at 16:22

2 Answers2

3

A pattern:

CREATE PROCEDURE load_to_TagMap ()
BEGIN
-- create table for loading data
CREATE TABLE tmp_TagMap ( ArticleID INT, Tag VARCHAR(255) ) ENGINE = Memory;
-- load data from file
LOAD DATA INFILE '/directory/filename.ext'
    INTO TABLE tmp_TagMap
    SKIP 1 LINES;
-- add absent tags into Tags table
INSERT INTO Tags (Tag)
    SELECT tmp_TagMap.Tag
    FROM tmp_TagMap 
    LEFT JOIN Tags USING (Tag)
    WHERE Tags.Tag IS NULL;
-- insert loaded data into TagMap table with lookup
INSERT INTO TagMap
    SELECT ArticleID, TagID
    FROM Tag
    JOIN tmp_TagMap USING (Tag);
-- remove loaded data table
DROP TABLE tmp_TagMap;
END

From PHP simply execute CALL load_to_TagMap;.

Akina
  • 39,301
  • 5
  • 14
  • 25
  • Judging by the OP's code I think you might need an extra step in there to insert any entries from the file into Tag if they don't already exist, before doing the insert into TagMap – ADyson Jun 10 '21 at 12:34
  • @ADyson Yes, this is possible. I'll add. – Akina Jun 10 '21 at 12:34
  • You don't even need to LOAD DATA INFILE if you configure the file as a table using CSV engine. – Bill Karwin Jun 10 '21 at 13:51
  • @BillKarwin Yes, this is possible. But this is the most strict solution, which needs the most amount of conditions for to be applicable. I try to avoid the solutions which deals with CSV engine... – Akina Jun 10 '21 at 15:32
  • It would be easier to do the second step as `INSERT IGNORE INTO Tags (Tag) SELECT DISTINCT tmp_TagMap.Tag FROM tmp_TagMap`. The duplicates will be skipped, but we save a `JOIN`. – Googlebot Jun 10 '21 at 16:26
  • @Googlebot Either I didn'd see `UNIQUE INDEX(Tag),` or you add it later. With it, of course, INSERT IGNORE must be used. – Akina Jun 10 '21 at 17:17
  • 'UNIQUE INDEX(Tag)' was surely there. Without it, the `Tags` table would be redundant, as `Tags` would be equal to `TagMap`. – Googlebot Jun 10 '21 at 17:48
  • no, giving me the idea was sufficient. Moreover, I just suggested `DISTINCT Tag`. Missing `IGNORE` is harmless, just increasing the `TagID` AUTO_INCREMENT. – Googlebot Jun 10 '21 at 18:00
1

Overnormalization.

"Tags" tend to be short strings, correct? The overhead of creating an INT for each and doing a secondary lookup is not worth it. Replace Tags and TagMap with

CREATE TABLE Tags
(
  ArticleID int(11) unsigned NOT NULL,
  Tag VARCHAR(255) NOT NULL,
  PRIMARY KEY(ArticleID,Tag)
  INDEX(Tag, ArticleID)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE utf8_general_ci

This may be of interest: http://mysql.rjweb.org/doc.php/lists

More

SELECT COUNT(*) FROM Tags WHERE Tag = '...';

is very efficient, even with Tag being VARCHAR. This also simplifies your code -- you don't need extra code to bump a counter; also it is easy to decrement the counters when an Article is removed:

DELETE FROM Tags WHERE ArticleID = ...;

If you expect to have 100K articles per tag, then there could be a performance problem. How many articles and tags are you expecting?

If the bigger picture is "Display the 'latest' 10 Articles for Tag='...', then the performance problem will be in the ORDER BY date DESC LIMIT 10. Currently that involves a join to the Article table, check for not 'deleted', sort, etc. But I have a solution for that: http://mysql.rjweb.org/doc.php/lists

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • I initially thought of that. It is very beneficial if retrieving rows by `ArticleID`. However, separate `Tags` tables have some advantages. (i) you can store meta info in columns about each tag (like the number of Articles). (ii) Although retrieving rows by `varchar` is not much different from `int`, but for a long table, the performance is considerably different (in my practice, at least). – Googlebot Jun 10 '21 at 15:33
  • @Googlebot - Those are solvable. See my addition. – Rick James Jun 10 '21 at 17:34