1

How to select any single full row from a group_by group in bigquery.

Given a table foo with columns [a, b, c, d, e, f] I would like to select any full row for each value of a. What would be a reasonable syntax here?

Existing question addresses selecting only a single column, not a whole row. PostgreSQL allows to do it using DISTINCT ON (a) but this is not available in BQ.

y.selivonchyk
  • 8,987
  • 8
  • 54
  • 77
  • from the linked question although it selects one column, with partition by you should be able to select *, row_number() over (partition by a order by a) in bigquery. If you can give sample input and output you expect will be make it easy to provide appropriate response. – Bihag Kashikar May 15 '23 at 23:16
  • 1
    What would be the difference between simply filtering `a` from `UNNEST`ed elements from grouped columns and what you want? `WITH dataset AS (select ['a', 'b', 'c', 'd', 'e', 'f'] as group_column) SELECT * FROM dataset as d WHERE ('a' in UNNEST(d.group_column))` – Jiho Choi May 16 '23 at 02:12
  • You might want to check QUALIFY with [`ROW_NUMBER`](https://stackoverflow.com/questions/70148035/what-is-the-advantage-of-using-bigquery-s-qualify-operator/70149163#70149163). – Jiho Choi May 17 '23 at 02:28

2 Answers2

0

This is not great (for example, any non-deterministic filtering [like LIMIT 1000] on the first table would brake the script because double querying the table, but it does the job:

WITH
numbered as (
   SELECT *, ROW_NUMBER() OVER (ORDER BY a) AS row_num 
   FROM `foo` ORDER BY a 
),

firsts as (
   select ANY_VALUE(row_num) as row_num from numbered
   group by job_id
),

single_record_per_group as (
   select a.* 
   from numbered as a
   inner join firsts as b
   on a.row_num = b.row_num 
)

SELECT * FROM single_record_per_group

If another column contains unique values only, it is better to use it instead of row_num

y.selivonchyk
  • 8,987
  • 8
  • 54
  • 77
0

This will give you a row number for each value of A

SELECT a, b, c, d, e, f, ROW_NUMBER() OVER (PARTITION BY A) AS RN
FROM some_table_you_did_not_name

You could also use an ORDER BY clause if you have a requirement to pick the first one

SELECT a, b, c, d, e, f, ROW_NUMBER() OVER (PARTITION BY A ORDER BY B) AS RN
FROM some_table_you_did_not_name

Now just take the first one.

SELECT a, b, c, d, e, f 
FROM (
  SELECT a, b, c, d, e, f, ROW_NUMBER() OVER (PARTITION BY A ORDER BY B) AS RN
  FROM some_table_you_did_not_name
) X
WHERE RN = 1
Hogan
  • 69,564
  • 10
  • 76
  • 117