2

Currently for my website, my articles have tags on them.

When I loop through the list of articles, for each one I do a mysql query to gather a list of category tags for them like so:

SELECT c.`category_name`, c.`category_id` 
FROM `articles_categorys` c 
    INNER JOIN `article_category_reference` r ON c.category_id = r.category_id 
WHERE r.article_id = 8136 
ORDER BY r.`category_id` = *IDHERE* DESC, r.`category_id` ASC 
LIMIT 4

Now, imagine a page where I have say 30 articles shown at once, the above query would be done 30 times. I imagine that's terrible.

What I decided to do was make an array of all the article_ids on a page and then do this instead:

SELECT c.`category_name`, c.`category_id`, r.article_id 
FROM `articles_categorys` c 
INNER JOIN `article_category_reference` r ON c.category_id = r.category_id 
WHERE r.article_id = 8136 OR r.article_id = 8130 OR r.article_id = 8127 
  OR r.article_id = 8125 OR r.article_id = 8123 OR r.article_id = 8120 
  OR r.article_id = 8119 OR r.article_id = 8117 OR r.article_id = 8116 
  OR r.article_id = 8112 OR r.article_id = 8107 OR r.article_id = 8106 
  OR r.article_id = 8037 OR r.article_id = 8104 OR r.article_id = 8103

I then use PHP just to loop through the array found from the above, match the article_id to the current article to them and if it matches echo the category_name. The only problem is now I can't limit it to 4 per article_id.

Is the newer approach better, worse, or do both suck horribly?

Example data to use:

CREATE TABLE `articles_categorys` (
  `category_id` int(11) NOT NULL,
  `category_name` varchar(32) COLLATE utf8_bin NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

CREATE TABLE `article_category_reference` (
  `ref_id` int(11) NOT NULL,
  `article_id` int(11) NOT NULL,
  `category_id` int(11) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

INSERT INTO `articles_categorys` (`category_id`, `category_name`) VALUES
(22, 'Site Info'),
(1, 'Editorial'),
(2, 'Review'),
(3, 'Interview'),
(4, 'Game Sale'),
(5, 'Steam'),
(6, 'Indie Game'),
(7, 'Crowdfunding'),
(8, 'Game Bundle'),
(9, 'Free Game'),
(10, 'MMO');

INSERT INTO `article_category_reference` (`ref_id`, `article_id`, `category_id`) VALUES
(15, 6231, 22),
(14, 6231, 1),
(16, 6231, 2),
(17, 6231, 3),
(18, 6231, 4),
(19, 6231, 9),
(20, 6231, 10);
NaughtySquid
  • 1,947
  • 3
  • 29
  • 44
  • Try `WHERE r.article_id IN (8136 , 8130, 8127, .......)` – RiggsFolly Sep 20 '16 at 23:09
  • Have you heart about IN() ? – jakub wrona Sep 20 '16 at 23:09
  • SQL query is usually the bottleneck of any webpage, so having one big (but smart!) query is often the best solution. Don't forget to `ORDER BY r.article_id` to group the tags in the result, it will saves code. – Proger_Cbsk Sep 20 '16 at 23:13
  • Okay so I switched over to IN as suggested, and phpmyadmin shows that makes the query run quite a lot faster so thanks. I still have the issue of not being able to limit to 4 per ID. Would it be best to do that in the PHP itself to stop after 4 for each one in my loop? – NaughtySquid Sep 20 '16 at 23:14
  • Use php to limit to 4, pure MySQL you'll have to create a function anyway. – Proger_Cbsk Sep 20 '16 at 23:25

2 Answers2

1

With SQL, it's always good to benchmark and do an EXPLAIN on your queries to see which will be faster. But as a rule of thumb, IN will probably outperform multiple ORs. That still leaves you with the problem of limiting the results to four each. Contrary to what's been said in the comments, it's not a good idea to fetch all the data and then choose only 4 items using PHP. You could potentially be retrieving thousands of rows only to display a hundred.

SET @rank=null, @val=null;
SELECT * FROM (

    SELECT c.`category_name`, c.`category_id`, r.article_id,
    @rank := IF(@val =  r.article_id, @rank+1,1) as rank,
    @val := r.article_id  
    FROM `articles_categorys` c 
    INNER JOIN `article_category_reference` r ON c.category_id = r.category_id 
    WHERE r.article_id IN (8136, /* fill this up accordingly */)

    ORDER BY r.`category_id` ASC
) AS a WHERE rank < 5;

For more details, see this excellent Q&A: Using LIMIT within GROUP BY to get N results per group?

In short, what this query does is to asign a rank to each record. At the start the rank is null, for the first row it will be set to null. While the r.category_id remains the same it will keep on increasing but would drop back to one when the category_id changes.

Community
  • 1
  • 1
e4c5
  • 52,766
  • 11
  • 101
  • 134
  • That's really interesting, never seen anything like that before! You've given me something new to look into, and it seems to work exactly how I want it so thanks I will consider this the correct solution. – NaughtySquid Sep 21 '16 at 11:22
  • In fact, I'm doing some actual testing on live data now, and it doesn't seem to limit it to 4. I took out "ORDER BY r.`category_id` = *IDHERE* DESC" because I don't know what you were meaning by that, we don't have the category id, we are grabbing it in the query itself. It seems to give all of them a rank of 1. – NaughtySquid Sep 21 '16 at 11:29
  • What's a good way to get some example data from my table into my question with the best formatting? – NaughtySquid Sep 21 '16 at 11:36
  • Okay, added in some example data, maybe that will help you :) – NaughtySquid Sep 21 '16 at 11:46
  • Updated my answer again. Looking at your data, it seems that the rank should have been based on article_id rather than category_id as i had initially thought. – e4c5 Sep 21 '16 at 12:36
  • Yeah I tinkered around with it for ages and figured that out eventually, many thanks for the great answer and the help, saved me so much time and this query runs damn fast too thanks to my indexes. – NaughtySquid Sep 22 '16 at 09:23
  • Glad to have been of help. All the best with your project. – e4c5 Sep 22 '16 at 09:24
0

sometimes not enough to just make good query, you must use SQL options. To make your queries faster you should use indices (such as hash or btree) or unique values https://www.tutorialspoint.com/mysql/mysql-indexes.htm .

Jorge T
  • 121
  • 1
  • 2
  • 9