Alternative heading: Subsetting a table before extracting a JSON element
I need to subset a very large table on BigQuery. The column that I will be filtering (joining) on to achieve this subsetting is not a JSON array. However, I would like to include/extract a complimentary column from a JSON array afterwards. No matter how I rearrange my query, it seems to process the full (i.e. non-subsetted) table when I include the extracted JSON element.
As a MWE, consider a query that I'm adapting/borrowing from @felipe-hoffa here:
SELECT id
FROM `githubarchive.day.20180830`
WHERE type='PushEvent'
AND id='8188163772'
This query processes 33.9 MB.
However, if I add an extracted column from the JSON array (which again, I'm not subsetting on):
SELECT id, JSON_EXTRACT_SCALAR(payload, '$.size') AS size
FROM `githubarchive.day.20180830`
WHERE type='PushEvent'
AND id='8188163772'
... then the process figure jumps to 3.5 GB (i.e. it is querying the whole table).
Any idea on how I could do this more efficiently and keep down per-query costs?