3

Consider the following "tweets" table

tweet_id  user_id  text
-----------------------------------------
1         1        look at my tweet
2         1        look at my tweet
3         1        a very different tweet
4         1        look at my tweet
5         1        look at my tweets
6         2        a cool tweet
7         2        this is my tweet
8         1        hello
9         1        hello

For each user, I want to count the number of duplicate tweets. In the example above, user_id 1 has a total of 5 tweets, of which 2 are unique (tweet_id 3 and 5) and 3 are duplicate (1, 2 and 4). So the outcome of the query for user 1 should be "3".

[EDIT] Look at user_id 1. The tweet "look at my tweet" appears 3 times, the tweet "hello" 2 times. The total number of duplicate tweets is then 3 + 2 = 5.

Pr0no
  • 3,910
  • 21
  • 74
  • 121
  • I can say that any of best solution will be very slow – safarov Mar 29 '12 at 11:36
  • Possible duplicate of [Finding duplicate values in a SQL table](https://stackoverflow.com/questions/2594829/finding-duplicate-values-in-a-sql-table) – tkruse Jan 15 '18 at 05:21

3 Answers3

4

For the first part, you can use the following query

select user_id, sum(count)
from
(
select user_id, text, count(tweet_id) count
from tweets 
group by 
user_id, text
having count(tweet_id) > 1
) t
group by user_id

The inner query finds all users and tweets that have occured more than once. The outer query adds up the duplicate values for each user

Chetter Hummin
  • 6,687
  • 8
  • 32
  • 44
  • Could you please explain your query? When I run this query for user 25 (a spam account), it returns 39,740; whereas Apurv Gupta's query returns 36,577 for the same user. I would like to figure out why by comparing the two queries. I'm asking him the same question :) – Pr0no Mar 29 '12 at 12:53
2

Try this:

Select count(text)-count(distinct text) from tweets where user_id=1
Apurv Gupta
  • 870
  • 7
  • 14
  • Could you please explain your query? When I run this query for user 25 (a spam account), it returns 36,577; whereas Amit Bhargava's query returns 39,740 for the same user. I would like to figure out why :) – Pr0no Mar 29 '12 at 12:52
  • Actually my query calculates number of duplicates, so my query will count "look at my tweet" as "one" legit entry and two others will be counted as duplicates, while Amit's query calculates number of texts that appeared more than once. My query would be extremely fast though. :) – Apurv Gupta Mar 29 '12 at 13:35
  • count(text) returns 7 and count(distinct text) returns 4 so the output is 3 here. But the expected output as per OP is 5. Can you please explain this query – Jayy Mar 29 '12 at 13:38
  • 1
    @Reveller: In simple English, this query counts messages when they start repeating. If there's no duplicates, it returns 0. If a message occurs twice, it will count as 1 (it is repeated once), if it occurs 3 times, the query will count it as 2 (repeated twice), and so on for every row. The results might or might not be of interest to you, but they are definitely not exactly what you are asking for, as far as I can understand your question. – Andriy M Mar 29 '12 at 15:40
0
select count(*) as count, text from table group by text order by user_id desc;

You will then need a server side function to group by user_id

Straseus
  • 478
  • 1
  • 4
  • 14
  • Please explain what you mean by a "server side function" to group by user_id. Why wouldn't "GROUP BY user_id" be unsuccesfull here? – Pr0no Mar 29 '12 at 12:54
  • I don't understand. I'm using SQL. I could call the query from within PHP if the output would need some processing, but I was hoping to do it all with one query since it works much faster imho. – Pr0no Mar 29 '12 at 13:08