1

I have a table as following (using bigquery):

id year month sales row_number
111 2020 11 1000 1
111 2020 12 2000 2
112 2020 11 3000 1
113 2020 11 1000 1

Is there a way in which I can select rows that have row numbers more than one?

For example, my desired output is:

id year month sales row_number
111 2020 11 1000 1
111 2020 12 2000 2

I don't want to just exclusively select rows with row_number = 2 but also row_number = 1 as well.

The original code block I used for the first table result is:

SELECT 
    id, 
    year, 
    month, 
    SUM(sales) AS sales, 
    ROW_NUMBER() OVER (PARTITIONY BY id ORDER BY id ASC) AS row_number
FROM 
    table
GROUP BY 
    id, year, month
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459

4 Answers4

2

You can use window functions:

select t.* except (cnt)
from (select t.*,
             count(*) over (partition by id) as cnt
      from t
     ) t
where cnt > 1;

As applied to your aggregation query:

SELECT iym.* EXCEPT (cnt)
FROM (SELECT id, year, month, 
             SUM(sales) as sales, 
             ROW_NUMBER() OVER (Partition by id ORDER BY id ASC) AS row_number
             COUNT(*) OVER(Partition by id ORDER BY id ASC) AS cnt
      FROM table
      GROUP BY id, year, month
     ) iym
WHERE cnt > 1;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
1

You can wrap your query as in below example

select * except(flag) from (
  select *, countif(row_number > 1) over(partition by id) > 0 flag 
  from (YOUR_ORIGINAL_QUERY)
)
where flag   

so it can look as

select * except(flag) from (
  select *, countif(row_number > 1) over(partition by id) > 0 flag 
  from (
    SELECT id, 
    year, 
    month, 
    SUM(sales) as sales, 
    ROW_NUMBER() OVER(Partition by id ORDER BY id ASC) AS row_number
    FROM table
    GROUP BY id, year, month
  )
)
where flag   

so when applied to sample data in your question - it will produce below output

enter image description here

Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
1

Try this:

with tmp as (SELECT id, 
year, 
month, 
SUM(sales) as sales, 
ROW_NUMBER() OVER(Partition by id ORDER BY id ASC) AS row_number
FROM table
GROUP BY id, year, month)
select * from tmp a where exists ( select 1 from tmp b where a.id = b.id and b.row_number =2)

It's a so clearly exists statement SQL

ElapsedSoul
  • 725
  • 6
  • 18
0

This is what I use, it's similar to @ElapsedSoul answer but from my understanding for static list "IN" is better than using "EXISTS" but I'm not sure if the performance difference, if any, is significant:

Difference between EXISTS and IN in SQL?

WITH T1 AS 
(
SELECT 
id, 
year, 
month, 
SUM(sales) as sales, 
ROW_NUMBER() OVER(PARTITION BY id ORDER BY id ASC) AS ROW_NUM
FROM table
GROUP BY id, year, month
)
SELECT * 
FROM T1
WHERE id IN (SELECT id FROM T1 WHERE ROW_NUM > 1);
JTD2021
  • 127
  • 2
  • 12