Selecting rows that have row_number more than 1

Question

I have a table as following (using bigquery):

id	year	month	sales	row_number
111	2020	11	1000	1
111	2020	12	2000	2
112	2020	11	3000	1
113	2020	11	1000	1

Is there a way in which I can select rows that have row numbers more than one?

For example, my desired output is:

id	year	month	sales	row_number
111	2020	11	1000	1
111	2020	12	2000	2

I don't want to just exclusively select rows with row_number = 2 but also row_number = 1 as well.

The original code block I used for the first table result is:

SELECT 
    id, 
    year, 
    month, 
    SUM(sales) AS sales, 
    ROW_NUMBER() OVER (PARTITIONY BY id ORDER BY id ASC) AS row_number
FROM 
    table
GROUP BY 
    id, year, month

score 2 · Accepted Answer · answered Dec 17 '20 at 02:19

You can use window functions:

select t.* except (cnt)
from (select t.*,
             count(*) over (partition by id) as cnt
      from t
     ) t
where cnt > 1;

As applied to your aggregation query:

SELECT iym.* EXCEPT (cnt)
FROM (SELECT id, year, month, 
             SUM(sales) as sales, 
             ROW_NUMBER() OVER (Partition by id ORDER BY id ASC) AS row_number
             COUNT(*) OVER(Partition by id ORDER BY id ASC) AS cnt
      FROM table
      GROUP BY id, year, month
     ) iym
WHERE cnt > 1;

omg. thank you so much. you're a life saver. :D – hustlecorean Dec 17 '20 at 02:30 — hustlecorean, Dec 17 '20 at 02:30

score 1 · Answer 2 · answered Dec 17 '20 at 02:30

You can wrap your query as in below example

select * except(flag) from (
  select *, countif(row_number > 1) over(partition by id) > 0 flag 
  from (YOUR_ORIGINAL_QUERY)
)
where flag

so it can look as

select * except(flag) from (
  select *, countif(row_number > 1) over(partition by id) > 0 flag 
  from (
    SELECT id, 
    year, 
    month, 
    SUM(sales) as sales, 
    ROW_NUMBER() OVER(Partition by id ORDER BY id ASC) AS row_number
    FROM table
    GROUP BY id, year, month
  )
)
where flag

so when applied to sample data in your question - it will produce below output

score 1 · Answer 3 · answered Dec 17 '20 at 02:37

Try this:

with tmp as (SELECT id, 
year, 
month, 
SUM(sales) as sales, 
ROW_NUMBER() OVER(Partition by id ORDER BY id ASC) AS row_number
FROM table
GROUP BY id, year, month)
select * from tmp a where exists ( select 1 from tmp b where a.id = b.id and b.row_number =2)

It's a so clearly exists statement SQL

score 0 · Answer 4 · answered Oct 22 '22 at 23:09

This is what I use, it's similar to @ElapsedSoul answer but from my understanding for static list "IN" is better than using "EXISTS" but I'm not sure if the performance difference, if any, is significant:

Difference between EXISTS and IN in SQL?

WITH T1 AS 
(
SELECT 
id, 
year, 
month, 
SUM(sales) as sales, 
ROW_NUMBER() OVER(PARTITION BY id ORDER BY id ASC) AS ROW_NUM
FROM table
GROUP BY id, year, month
)
SELECT * 
FROM T1
WHERE id IN (SELECT id FROM T1 WHERE ROW_NUM > 1);

Selecting rows that have row_number more than 1

4 Answers4