5

I'm working in BigQuery. I have a table t1 which has address, postcode, price and date fields. I want to group this by address and postcode, an find the price of the most recent row for each address.

How can I do this in BigQuery? I know how to get the address, postcode and most recent date:

SELECT
  ADDRESS, POSTCODE, MAX(DATE)
FROM
  [mytable]
GROUP BY
  ADDRESS,
  POSTCODE

But I don't know how to get the price of these rows matching these fields. This is my best guess, which does produce results - will this be correct?

SELECT 
  t1.address, t1.postcode, t1.date, t2.price
FROM [mytable] t2
JOIN 
(SELECT
  ADDRESS, POSTCODE, MAX(DATE) AS date
FROM
  [mytable]
GROUP BY
  ADDRESS,
  POSTCODE) t1
ON t1.address=t2.address 
   AND t1.postcode=t2.postcode 
   AND t1.date=t2.date

This seems to me like it should work, but some of the similar questions have solutions that are much more complex.

Community
  • 1
  • 1
Richard
  • 62,943
  • 126
  • 334
  • 542

2 Answers2

8

Just use row_number():

SELECT t.*
FROM (SELECT t.*,
             ROW_NUMBER() OVER (PARTITION BY ADDRESS, POSTCODE
                                ORDER BY DATE DESC
                               ) as seqnum
      FROM [mytable] t
     ) t
WHERE seqnum = 1;

This is not an aggregation query. You want to filter the rows to get the most recent value.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thanks! Out of interest, why is this preferable to my method? – Richard Apr 12 '17 at 11:35
  • 1
    Your method doesn't let you retrieve the price associated with the maximum date unless you do a self join, whereas this one doesn't require a join. You could also take a look at my answer in http://stackoverflow.com/a/43340019/6253347 as an example of using aggregate functions to select the most recent row. – Elliott Brossard Apr 12 '17 at 11:53
5

Try below for BigQuery Standard SQL

#standardSQL
SELECT row.* FROM (
  SELECT ARRAY_AGG(t ORDER BY date DESC LIMIT 1)[OFFSET(0)] AS row
  FROM `yourTable` AS t
  GROUP BY address, postcode
)  

You can play/test it with dummy data as below

#standardSQL
WITH yourTable AS (
  SELECT 'address_1' AS address, 'postcode_1' AS postcode, '2017-01-01' AS date, 1 AS price UNION ALL
  SELECT 'address_1', 'postcode_1', '2017-01-02', 2 UNION ALL
  SELECT 'address_1', 'postcode_1', '2017-01-03', 3 UNION ALL
  SELECT 'address_1', 'postcode_1', '2017-01-04', 4 UNION ALL
  SELECT 'address_2', 'postcode_2', '2017-01-01', 5 UNION ALL
  SELECT 'address_3', 'postcode_1', '2017-01-01', 6 UNION ALL
  SELECT 'address_3', 'postcode_1', '2017-01-02', 7 UNION ALL
  SELECT 'address_3', 'postcode_1', '2017-01-03', 8 
)
SELECT row.* FROM (
  SELECT ARRAY_AGG(t ORDER BY date DESC LIMIT 1)[OFFSET(0)] AS row
  FROM `yourTable` AS t
  GROUP BY address, postcode
)
Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230