I have a date(createDt
) partitioned BQ table that is being loaded by kafka stream data. I want to select only the latest record based on updatedTime
grouped by ID
column. I am able to achieve it through the 2nd top solution referred here. I want to create it as a view and did it by the following code
create or replace view myView as
SELECT row.* FROM (
SELECT ARRAY_AGG(t ORDER BY updatedTime DESC LIMIT 1)[OFFSET(0)] AS row
FROM `yourTable` AS t
GROUP BY ID
)
I noticed that if I query myView
using the partitioned date column createDt
in where condition, the entire base table is being scanned which I don't want to happen.
However if I a create a normal view like
create or replace view myNormalView as
SELECT * FROM `yourTable`
and then query the using createDt
, only the required partition is being scanned and very less bytes are read/billed.
My question - How do I make partition work on top of ARRAY_AGG(t ORDER BY updatedTime DESC LIMIT 1)[OFFSET(0)]
.
I am still a beginner in BigQuery. Any suggestion/solution is highly appreciated