1

I am having issues with a SQL query that ideally should return all the comments to a thread in a forum.

Right now i'm having the following query:

    SELECT p.*, 'BBCode' AS Format,
        FROM_UNIXTIME(TIME) AS DateInserted,
        FROM_UNIXTIME(editTime) AS DateUpdated
    FROM et_post p
    LEFT JOIN et_conversation c ON c.conversationId = p.conversationId
    WHERE c.private = 0
    AND p.postId NOT IN (
        SELECT p.postId
        FROM et_conversation c
        LEFT JOIN et_post p ON p.conversationId = c.conversationId WHERE c.private = 0
        GROUP BY p.conversationId
        ORDER BY p.TIME
    )

This, however, returns 0 rows. I expect it to return around 8800 rows.

If I run the first part alone:

    SELECT p.*, 'BBCode' AS Format,
        FROM_UNIXTIME(TIME) AS DateInserted,
        FROM_UNIXTIME(editTime) AS DateUpdated
    FROM et_post p
    LEFT JOIN et_conversation c ON c.conversationId = p.conversationId
    WHERE c.private = 0

Output:

# postId, conversationId, memberId, time, editMemberId, editTime, deleteMemberId, deleteTime, title, content, attributes, Format, DateInserted, DateUpdated
'12', '5', '1', '1436600657', NULL, NULL, NULL, NULL, '', 'Content1', ?, 'BBCode', '2015-07-11 09:44:17', NULL
'13', '5', '1', '1436600681', NULL, NULL, NULL, NULL, 'Testing area', 'Content2', ?, 'BBCode', '2015-07-11 09:44:41', NULL
'14', '5', '1', '1436600698', NULL, NULL, NULL, NULL, 'Testing area', 'Content 3', ?, 'BBCode', '2015-07-11 09:44:58', NULL
'15', '5', '19', '1436602065', NULL, NULL, NULL, NULL, 'Testing area', 'More content', ?, 'BBCode', '2015-07-11 10:07:45', NULL
'16', '5', '19', '1436602093', NULL, NULL, NULL, NULL, 'Testing area', 'Even more content', ?, 'BBCode', '2015-07-11 10:08:13', NULL
'17', '5', '1', '1436602137', NULL, NULL, NULL, NULL, 'Testing area', 'Will it ever stop?', ?, 'BBCode', '2015-07-11 10:08:57', NULL
'54', '5', '1', '1436617274', NULL, NULL, NULL, NULL, 'Testing area', 'Ah, final one..', ?, 'BBCode', '2015-07-11 14:21:14', NULL

It returns 9304 rows like the above which sounds right.

Running the subquery alone:

        SELECT p.postId
        FROM et_conversation c
        LEFT JOIN et_post p ON p.conversationId = c.conversationId WHERE c.private = 0
        GROUP BY p.conversationId
        ORDER BY p.TIME

Output:

# postId
'12'
'18'
'19'
'44'
'70'
'73'
'75'

And it gives me 412 rows like the above which also sounds right.

Ideally, my output of the final query should look like this:

# postId, conversationId, memberId, time, editMemberId, editTime, deleteMemberId, deleteTime, title, content, attributes, Format, DateInserted, DateUpdated
'13', '5', '1', '1436600681', NULL, NULL, NULL, NULL, 'Testing area', 'Content2', ?, 'BBCode', '2015-07-11 09:44:41', NULL
'14', '5', '1', '1436600698', NULL, NULL, NULL, NULL, 'Testing area', 'Content 3', ?, 'BBCode', '2015-07-11 09:44:58', NULL
'15', '5', '19', '1436602065', NULL, NULL, NULL, NULL, 'Testing area', 'More content', ?, 'BBCode', '2015-07-11 10:07:45', NULL
'16', '5', '19', '1436602093', NULL, NULL, NULL, NULL, 'Testing area', 'Even more content', ?, 'BBCode', '2015-07-11 10:08:13', NULL
'17', '5', '1', '1436602137', NULL, NULL, NULL, NULL, 'Testing area', 'Will it ever stop?', ?, 'BBCode', '2015-07-11 10:08:57', NULL
'54', '5', '1', '1436617274', NULL, NULL, NULL, NULL, 'Testing area', 'Ah, final one..', ?, 'BBCode', '2015-07-11 14:21:14', NULL

(Notice postId 12 is gone)

[EDIT] From some quick head calculations I came up to the fact that the following query sounds right according to the number of rows returned:

    SELECT p.*, 'BBCode' AS Format,
        FROM_UNIXTIME(TIME) AS DateInserted,
        FROM_UNIXTIME(editTime) AS DateUpdated
    FROM et_post p
    INNER JOIN et_conversation c ON c.conversationId = p.conversationId
    WHERE c.private = 1
    AND p.postId NOT IN (
        SELECT DISTINCT po.conversationId
        FROM et_post po
    );

[EDIT2] Now with an sqlfiddle

Basically, I want the rows with id 12, 15 and 18 to be gone since they are the original posts created by the one who started the conversation.

[EDIT3] Now with an updated sqlfiddle

  • I dug some more into the database and figured out the first sqlfiddle wasn't 100% correct regarding how the data is in the database - therefore this updated version.
Frederik
  • 632
  • 4
  • 16
  • 34
  • Always check for nulls when using `not in`. Try adding `where postId is not null` to the subquery. – shawnt00 Jul 25 '15 at 18:24
  • 1
    Your LEFT JOINs are sabotaged by WHERE into INNER JOINs,if you want all your posts replace WHERE with AND – Mihai Jul 25 '15 at 18:25
  • 1
    P.postid will be null in any row preserved by the outer join in `SELECT p.postId FROM et_conversation c LEFT JOIN et_post p` . Plugging that into a `not in` will then return no rows. – Martin Smith Jul 25 '15 at 18:27
  • 2
    Your `group by` and `order by` in the subquery don't really make sense either. – shawnt00 Jul 25 '15 at 18:29
  • if you ever use NOT IN, and you don't know your data, or how it interacts with null, you are in great peril – Drew Jul 25 '15 at 18:29
  • 1
    Use `NOT EXISTS` instead of `NOT IN`. It has the right semantics. – Gordon Linoff Jul 25 '15 at 18:31
  • @GordonLinoff - I tried with `NOT EXISTS` but got the same output - 0 rows. I don't have any NULL's in the postId column anywhere. – Frederik Jul 25 '15 at 18:42
  • I have also tried various formats of the `is not null` to no avail. – Frederik Jul 25 '15 at 18:46
  • 2
    Why are you doing an outer join? As stated earlier they will be null in rows preserved by that. If there are no such rows why isn't it an inner join? – Martin Smith Jul 25 '15 at 18:46
  • @MartinSmith - I tried with an inner join too - also 0 rows.. Would you care write an example on how you would do it? – Frederik Jul 25 '15 at 18:51
  • Explain what the semantics of the query are supposed to be. You are also using Mysql's non standard and not guaranteed `group by` behaviour. – Martin Smith Jul 25 '15 at 18:54
  • It's not correlated so it's true that nulls would cause all the rows to disappear and not just some. Nevermind, I re-read and that's what it's doing. – shawnt00 Jul 25 '15 at 18:57
  • So it looks like you have posts that don't attach to conversations or something. It's not clear what you want to accomplish by the inner query. – shawnt00 Jul 25 '15 at 19:05
  • @shawnt00 - I haven't double-checked every single row in the database, but I would surely doubt that since I would have had a load of other problems then. – Frederik Jul 25 '15 at 19:12
  • @MartinSmith - I want to exclude every first post from a every thread and then only get the replies. The subquery is fetching all the original posts ID's and it is those I want to exclude from the resultset in my main query. – Frederik Jul 25 '15 at 19:12
  • 1
    But why are you joining the et_conversations? Its columns are never referenced. (except for the is_private, but the left join makes that irrelevant) – wildplasser Jul 25 '15 at 19:20
  • 1
    if you are stumped, then create a sqlfiddle with 100 rows of slimed non-sensitive data and let people wail on it. State what you expect the **Desired Results** to be – Drew Jul 25 '15 at 19:24
  • 100 rows would be too much TBH. It should be possible to provide dummy data with no more than 10 rows per table that demonstrates what you are trying to do. – Martin Smith Jul 25 '15 at 19:28
  • 1
    @wildplasser - Good point. The second c.private actually didn't make any sense. I think I got something now.. Will provide a sqlfiddle if it doesn't work out. I have edited the original questions with my findings so far – Frederik Jul 25 '15 at 19:29
  • 1
    sorry @MartinSmith, I was babbling binary, my 100 meant 4 – Drew Jul 25 '15 at 19:33
  • @DrewPierce - As you wish - I added a sqlfiddle to the question. – Frederik Jul 25 '15 at 19:59
  • @DrewPierce - The actual criteria is: The first post in a given conversationId group - so simple as that actually. Could also be rephrased as: **The lowest postId in a conversationId group** – Frederik Jul 25 '15 at 20:46
  • so you want to ALWAYS ditch the first one. Even if there is only one – Drew Jul 25 '15 at 20:50
  • @DrewPierce - Correct. Always. – Frederik Jul 25 '15 at 20:53

4 Answers4

1

Based on the provided SQLFiddle in the edited question, this works.

SELECT p.*, 'BBCode' AS Format,
FROM_UNIXTIME(TIME) AS DateInserted,
FROM_UNIXTIME(editTime) AS DateUpdated
FROM et_post p
INNER JOIN et_conversation c 
ON c.conversationId = p.conversationId
and c.private = 0
join (
select conversationId,min(postId) as m
from et_post
group by conversationId
) r
on r.conversationId = c.conversationId
where p.postId<>r.m

12,15,18 disappear as requested in your edit ... so too does NOT IN madness

Drew
  • 24,851
  • 10
  • 43
  • 78
  • I dug some more into the database and updated my question with a new sqlfiddle - your solution is unfortunately not perfect. – Frederik Jul 25 '15 at 20:41
  • so it's a new question is it ! – Drew Jul 25 '15 at 20:42
  • Well, no.. I just dug more into the database since the number of rows your snippet sent out didn't quite match my expectations. – Frederik Jul 25 '15 at 20:43
  • This does indeed look really promising.. Still have some small things to fix but none that are in scope of this question. Thanks! – Frederik Jul 25 '15 at 21:21
0

first of all , i get no idea what you are trying to achieve with this query, but i am going to tell you whats wrong with your query here ..

In the second part you are using the following query :

SELECT p.postId
    FROM et_conversation c
    LEFT JOIN et_post p ON p.conversationId = c.conversationId WHERE c.private = 0
    GROUP BY p.conversationId
ORDER BY p.TIME

If you run it separately it gives following :

12,15,18

Now remove the Order by and the Group by clause . The query is now:.

SELECT p.postId
FROM et_conversation c
LEFT JOIN et_post p ON p.conversationId = c.conversationId WHERE c.private = 0

If you run this query , you get the result :

12,13,14,15,16,17,18,19,20

which are all of the postid's . the sql is ignoring all of these post id's before grouping them.. because your Group by clause is logically wrong. you are grouping on a column and selecting another column which is not unique in this case.

whatever. the thing is , NOT IN clause is checking the postids before group by.

You need to change logic for your script.

captain_a
  • 3,328
  • 1
  • 14
  • 23
  • Can you give an example that works then? I cannot change the logic in the script that runs the sql code - it all has to be done in the SQL select part. – Frederik Jul 25 '15 at 20:42
0

The issue here appears very simple. Lets break down your statement:

 SELECT p.*, 'BBCode' AS Format,
        FROM_UNIXTIME(TIME) AS DateInserted,
        FROM_UNIXTIME(editTime) AS DateUpdated
    FROM et_post p
    LEFT JOIN et_conversation c ON c.conversationId = p.conversationId
    WHERE c.private = 0
    AND p.postId NOT IN (
        SELECT p.postId
        FROM et_conversation c
        LEFT JOIN et_post p ON p.conversationId = c.conversationId WHERE c.private = 0
        GROUP BY p.conversationId
        ORDER BY p.TIME
    )

In the first section you are pulling ALL rows from the et_post table and et_conversation table where c.private = 0

There is no other clauses.

Then in your NOT IN section, you are saying:

"Return ALL results where c.private=0"

And then of course it removes them from the outer result.

So what is happening here is:

a) You are returning all records in the outer statement b) the NOT IN is returning ALL statements based on the SAME WHERE conditions c) With every row matching, of course you get zero results

Sounds like you need to modify your subquery to what it is exactly that you don't want to see.

Walker Farrow
  • 3,579
  • 7
  • 29
  • 51
-1

omitting the unused et_conversations :

SELECT p.*
    , 'BBCode' AS Format
    , FROM_UNIXTIME(TIME) AS DateInserted
    , FROM_UNIXTIME(editTime) AS DateUpdated
FROM et_post p
WHERE EXISTS ( -- suppress the first one
   SELECT 1
   FROM et_post x
   WHERE x.postid = p.postid
   AND x.TIME < p.TIME
   );
wildplasser
  • 43,142
  • 8
  • 66
  • 109