0

as a beginner, I'm struggling with this for a few hours now, any help is much appreciated :D

I have 3 tables:

threads (id, title)
tags (id, name) 
tag_thread (thread_id, tag_id)

The following, working code returns the 5 most used tags (names) of the latest 100 threads:

select ta.name
from tags ta
join tag_thread tt on tt.tag_id = ta.id
join (select * from threads order by id desc limit 100) th on tt.thread_id = th.id
group by ta.name
order by count(ta.name) desc
limit 5

Question: Given there is a $tagId, how would I have to change to query to filter the results so that only those tags show up, which are connected to threads that use tags with $tagId?

Thank you very much!

RiggsFolly
  • 93,638
  • 21
  • 103
  • 149
Hillcow
  • 890
  • 3
  • 19
  • 48
  • it will be helpful if you provide a sample of how you want your result to look like. – OLIVER.KOO Jul 05 '17 at 20:28
  • the result looks really simple: a single column 'name' containing the tags. the 5 most used tags of the latest 100 threads that use the tag with $tagid – Hillcow Jul 05 '17 at 20:31
  • Isn't that what you want? you said "to filter the results so that only those tags show up" you only got tags name in your result set at this point. – OLIVER.KOO Jul 05 '17 at 20:55
  • yes, but that's not what I want. you cut off my sentence there :D I want to filter for those tags that are connected by threads. it's quite hard to put in words. here is an example table: https://pastebin.com/hr78565F one thread can have mulitple tags, I want to get only the tags of threads that got the tag with $tagId, apart from the tag with $tagId itself. – Hillcow Jul 05 '17 at 21:04
  • so for example, if your current query returns you with `ta.name : a,b,c,d,e` with `ta.id: 1,2,3,4,5` you want to find these five tags thread.id and treads.title? – OLIVER.KOO Jul 05 '17 at 21:09
  • not quite, I want to look for threads that use a specific given $tagId and grab the tags of those threads. – Hillcow Jul 05 '17 at 21:19

3 Answers3

1

If it were me, I would take a slightly different approach to the problem. This should offer faster performance and is easier to read in my opinion:

SELECT * FROM (
  SELECT ta.name, COUNT(ta.name) AS tag_count
  FROM threads th
  JOIN tag_thread tt
    ON tt.tag_id = th.id
  JOIN tags ta
    ON ta.id = tt.thread_id
  GROUP BY ta.name
  ORDER BY th.id DESC LIMIT 100
)
ORDER BY tag_count DESC LIMIT 5

It's generally a good idea to eliminate subselects in joins and aggregates in the ORDER BY clause. If you had an index on ta.name, it would be ignored in your query and a full table scan would be performed regardless.

I saw your pastebin but I'm not entirely convinced I understand what you're trying to do. Please comment below if I misunderstood.

McGlothlin
  • 2,059
  • 1
  • 15
  • 28
  • It's such a simple problem but quite hard to put in words. I'll simply tell you what I want to achieve in the end, maybe that way you understand. Think of a website just like stackoverflow.com, where posts have tags. now you browse for posts with a specific tag. I want to display trending tags for this specific tag you are browsing. So I only want to show recent tags that are used by threads that use this very tag (with $tagId). – Hillcow Jul 05 '17 at 21:16
  • Can't you just add a `WHERE ta.id = 'foo'` after the `JOIN`s then? – McGlothlin Jul 05 '17 at 21:20
  • No, because then it would only return this one tag. But I don't want this one tag, I want the related tags. – Hillcow Jul 05 '17 at 21:24
  • I'm not sure what you mean by "related tags". Aren't you filtering by related tags by joining `threads` to `tag_thread`, then again to `tags` on id? To me, this indicates that `tag_thread` is an intermediary table that links a tag to a related thread based their respective ids. Is this not the case? – McGlothlin Jul 05 '17 at 21:26
  • 1
    How would we know what tags are *related*? Your schema doesn't include such information, nor does it include enough information to derive it. That part of your question is magic. – Jacobm001 Jul 05 '17 at 21:28
  • It's not magic, it's what I am trying to explain to you. I think I need multiple queries to do that. First, I need all the thread_id's where tag_id's = $tagId. And then, in the second query, I need the tag_id's where thread_id's = result of the first one. And then, back to the initial query I posted here, I have to figure out the order via recent activity. I'm just not sure how to do it in code now :D – Hillcow Jul 05 '17 at 21:37
  • Did you try my solution? If you tell me the difference between that and your expected outcome, it might be easier. I'm not trying to be difficult, just want to make sure we're on the same page. – McGlothlin Jul 05 '17 at 21:50
0

Try adding a where clause before "group by": WHERE ta.id IS NOT NULL.

See: Not equal <> != operator on NULL

Robert Peake
  • 118
  • 8
  • doesn't do anything unfortunately. I tried to explain myself a little better including an example table: https://stackoverflow.com/questions/44935400/double-join-nested-mysql-query?noredirect=1#comment76848129_44935400 – Hillcow Jul 05 '17 at 21:05
0

If i've understood correctly, I would recommend you create a temporary table to store the threads which have the tags you are interested in

    CREATE TEMPORARY TABLE tmp_threads AS
    SELECT distinct tt.thread_id
    FROM tag_thread tt
    WHERE tt.tag_id IN (123,46,12,121)
-- (put tag ids in above query)

next you change your query to something like this

select ta.name
from tags ta
join tag_thread tt on tt.tag_id = ta.id
join tmp_threads th on tt.thread_id = th.id
group by ta.name
order by count(ta.name) desc
limit 5
Omar Alves
  • 763
  • 5
  • 13
  • This is a really cool approach and I think you got it (apart from the fact that I only intend to use one tag_id, not multiple). I did not know about temporary tables. However, I wonder if there is another solution to it? And I can't get the temporary table to work on my local machine. It keeps telling me Table 'tmp_threads' doesn't exist. Even though I granted the privileges to myself. – Hillcow Jul 05 '17 at 22:04
  • Multiple tags comes as a bonus :) Are you sure you have issued the privilege CREATE TEMPORARY TABLES to your user/database? Please the take a look at https://dev.mysql.com/doc/refman/5.7/en/privileges-provided.html#priv_create-temporary-tables Another possible approach is to put SELECT distinct tt.thread_id FROM tag_thread tt WHERE tt.tag_id IN (123,46,12,121) as a join table in your original query, but it's not good because it make the query less readble and can raise some performance issues. Using temptables allows you to create indexes on them. – Omar Alves Jul 06 '17 at 13:18
  • 1
    Well, I tried. However, meanwhile I figured how to do it without a temporary table, just using WHERE IN and a self join :D – Hillcow Jul 06 '17 at 13:20
  • @Hillcow nice work. Can you please post your solution so that all may benefit? You are also allowed to accept your own answer if you want. – McGlothlin Jul 06 '17 at 16:39