5

This is a bit challenging but fun question. Consider having these tables

tweets

tweet_id | retweet_of_id | user_id

follow

user_id | followed_user_id

So we store each "retweet as a separate tweet" pointing to the original tweet's id (retweet_of_id). This is because I want to have comments under each one separately. If something is not a retweet then retweet_of_id will be 0.

How do I retrieve the following using MySQL efficiently?

  • My own tweets
  • All the original tweets (from users that I follow)
  • And the first retweet (by a user I follow) of a tweet (from a user that I don't follow)

And that the result should a combination of both (in order) just like how twitter does it.
Please consider that there may be 1,000,000 tweets and we only need the most recent ones (e.g.: 10).


Here is an example (I'm user 1 and I follow user 2 & 3)

tweet_id | retweet_of_id | user_id
----------------------------------
    1            0            4          <- EXCLUDE (I don't follow user 4)
    2            0            2          <- INCLUDE (I follow user 2)
    3            0            3          <- INCLUDE (I follow user 3)
    4            1            2          <- INCLUDE (I follow user 2 & first RT)
    5            1            3          <- EXCLUDE (I already have the first RT)
    6            2            3          <- EXCLUDE (I already have the orignal)
    7            0            1          <- INCLUDE (My own tweet)

So the final order should be these tweets: 7, 4, 3, 2 (starting with the most recent)

tradecenter
  • 125
  • 5

2 Answers2

1

Here's how I solved it
(both of these assume that the tweets are ordered by their tweet_id ASC)

Solution 1 (correct, runs fast)

SELECT tweet_id,
FROM tweets 
WHERE user = 1 OR user IN (2,3)  
GROUP BY  IF(retweet_of_id = 0, tweet_id, retweet_of_id)
ORDER BY tweet_id DESC

Solution 2 (gives correct results, but it's dog slow for 1,000,000 tweets)

SELECT p1.tweet_id FROM tweets p1 
LEFT JOIN tweets p2 
       ON p2.user IN (2,3)
      AND p1.tweet_id > p2.tweet_id
      AND (p1.retweet_of_id = p2.tweet_id 
           OR p1.retweet_of_id AND p1.retweet_of_id = p2.retweet_of_id )
WHERE p2.tweet_id IS NULL
  AND (p1.user = 1 OR p1.user IN (2,3)) 
ORDER BY p1.tweet_id DESC
tradecenter
  • 125
  • 5
0

all the original tweets (from users that I follow)

1 users that i follow:

select user_id from follow where followed_user_id= MyOwnID

2 all the original tweets:

select * from tweets where retweed_of_id=0

both combined:

select * from tweets where retweed_of_id=0 and
user_id in (select user_id from follow where followed_user_id= MyOwnID)

that should be it - or did i miss something?

Community
  • 1
  • 1
wegus
  • 282
  • 1
  • 9
  • Yes I know how to get these. The tricky part is the second point about the **first** retweet (by a user I follow) of a tweet (from a user that I don't follow). And that the result should a combined list (in order) with the original tweets. Just like how twitter does it. – tradecenter Nov 23 '12 at 13:09
  • you will need an autoincrement id for that, have to modify my upper example to retweed_of_id<>0 and have to limit the answer to 1 ( so you just get the first re_tweets of users you know). Having another subselect you can limit that to tweets of people you don't know! – wegus Nov 23 '12 at 13:45
  • Can you modify your answer so I can see what you mean? I also added an example. Maybe it helps. – tradecenter Nov 23 '12 at 14:06