-1

I am not able to get this working because of the DISTINCT keyword

SELECT 
    ROW_NUMBER() OVER() AS tweet_bucket,
    DISTINCT COUNT(user_id) AS users_num
FROM
    tweets
WHERE 
    tweet_date >= '2022-01-01 00:00:00' 
GROUP BY 
    user_id

I know if I remove the windows function and just put it as:

SELECT DISTINCT 
    COUNT(user_id) AS users_num
FROM 
    tweets
WHERE 
    tweet_date >= '2022-01-01 00:00:00' 
GROUP BY 
    user_id

it works

I cannot remove the windows function as I need to get a row_id using ROW_NUMBER()

Realy struggling, is there anyway of using the same logic and have the distinct?

Here is my data:

CREATE TABLE tweets 
(
    tweet_id INT,
    user_id INT,
    msg VARCHAR(500),
    tweet_date DATETIME
);
  
DROP TABLE tweets;
  
INSERT INTO tweets 
VALUES (214252, 111, "Am considering taking Tesla private at $420. Funding secured", '2021-12-30 00:00:00');

INSERT INTO tweets 
VALUES (739252, 111, "Despite the constant negative press covfefe", '2022-01-01 00:00:00');

INSERT INTO tweets 
VALUES (846402, 111, "Following @NickSinghTech on Twitter changed my life!", '2022-02-14 00:00:00');

INSERT INTO tweets 
VALUES (241425, 254, "If the salary is so competitive why won’t you tell me what it is?", '2022-03-01 00:00:00');

INSERT INTO tweets 
VALUES (231574, 148, "I no longer have a manager. I can't be managed", '2022-03-01 00:00:00');
  
SELECT * FROM tweets;

SELECT DISTINCT 
    COUNT(user_id) AS users_num,
    ROW_NUMBER() OVER() AS tweet_bucket
FROM 
    tweets
WHERE 
    tweet_date >= '2022-01-01 00:00:00' 
GROUP BY 
    user_id
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
  • 2
    SELECT DISTINCT must come first. Just move the ROW_NUMBER call to after the DISTINCT. I question whether you really want DISTINCT, however. If you're actually counting distinct users then you might want COUNT(DISTINCT user_id), although I'm not sure that's entirely correct either because it doesn't make sense to count distinct user_id if you're grouping by user_id. And it doesn't make sense to get distinct counts either – Kurt Aug 31 '22 at 14:47
  • Also I question your terminology of row_id... using ROW_NUMBER() is not a method to obtain a row_id but instead just a row number based on the criteria such as the Order By or just the random order the database produced the dataset. If you are looking to generate keys for a dataset to INSERT into a table or 'something' use an IDENTITY column which auto generates the next key for that field OR you can make use of SEQUENCES. – Code Novice Aug 31 '22 at 14:50
  • It works by putting DISTINCT before ROW_NUMBER . But I do not get the right answer. I was expecting none of the fields in users_num column will be repeated but that is not the case. However, if I remove ROW_NUMBER() OVER() AS tweet_bucket it works perfectly i.e. I have tne column with no fields repeated but I do need the row_number column which is just a row id in asceding oder. – ayush_maheshwari Aug 31 '22 at 14:55
  • Also note, using ROW_NUMBER() is not something that you can rely on in order to produce the same number each time... new records in your database or a change in the query or Order By will result in different row numbers per the same records in the resultset. – Code Novice Aug 31 '22 at 14:56
  • A [mcve] is a great start when asking for SQL assistance. We don't know your data, and we don't know what you want... – jarlh Aug 31 '22 at 14:57
  • Which dbms are you using? – jarlh Aug 31 '22 at 14:57
  • I have put my data in a form. I am using sqlite – ayush_maheshwari Aug 31 '22 at 15:02
  • I have revised my question and put the dataI want to write a query to obtain a histogram of tweets posted per user in 2022. and Output the tweet count per user as the bucket, and then the number of Twitter users who fall into that bucket. – ayush_maheshwari Aug 31 '22 at 15:10

2 Answers2

1

You can use this in Oracle PL/SQL, maybe you should try OVER(ORDER BY user_id).

sql

SELECT DISTINCT COUNT(user_id) AS users_num, 
    ROW_NUMBER() OVER(ORDER BY user_id) AS tweet_bucket
FROM tweets
WHERE tweet_date >= to_date('2022-01-01','YYYY-MM-DD')
GROUP BY user_id
order by tweet_bucket

May this post/answer also help to get a better understanding: OVER clause in Oracle

Koushik Roy
  • 6,868
  • 2
  • 12
  • 33
0

I get the right answer with a CTE

WITH HistogramTweets AS   (   SELECT   DISTINCT COUNT(user_id) AS
users_num   FROM tweets   WHERE tweet_date >= '2022-01-01 00:00:00'  
GROUP BY user_id ) SELECT ROW_NUMBER() OVER() AS tweet_bucket,
users_num FROM HistogramTweets