I highly appreciate any suggestions to optimise the Query below
as this Query is timing out in the State it is as of right now. As BikeLogsTable
contains a about 10000 - 1 Mil Rows which can be Joined on each Row and as it is, this Query is timing out after 6 hours.
Note that the core goal here is to associate each Row of BikesTable with the correlating Row in States. In this step I'm attempting to determine the associated Row of BikeLogTable and retrieve the Row number to determine the last associated entry.
Select Bikes.vehicleId,Bikes.timestamp_field_24,States.to,States.vehiclestatechangeid
ROW_NUMBER() OVER (
PARTITION BY Bikes.timestamp_field_24,Bikes.vehicleId
ORDER BY States.timestamp
)
from `BikesTable` as Bikes
Right Join `BikeLogsTable` as States
on Bikes.vehicleId = CAST(States.vehicleid as String)
WHERE
DATE_SUB( PARSE_TIMESTAMP("%F %T", REGEXP_REPLACE(REGEXP_REPLACE(REGEXP_REPLACE(REGEXP_REPLACE(Bikes.timestamp_field_24, "\\+.*",""), "\\..*", " "), "Z", " "), "T", " ")) , INTERVAL 1 DAY) < States.timestamp AND
PARSE_TIMESTAMP("%F %T", REGEXP_REPLACE(REGEXP_REPLACE(REGEXP_REPLACE(REGEXP_REPLACE(Bikes.timestamp_field_24, "\\+.*",""), "\\..*", " "), "Z", " "), "T", " ")) > States.timestamp AND
States.to is not null and
PARSE_TIMESTAMP("%F %T", REGEXP_REPLACE(REGEXP_REPLACE(REGEXP_REPLACE(REGEXP_REPLACE(Bikes.timestamp_field_24, "\\+.*",""), "\\..*", " "), "Z", " "), "T", " ")) > "2021-11-22" and
PARSE_TIMESTAMP("%F %T", REGEXP_REPLACE(REGEXP_REPLACE(REGEXP_REPLACE(REGEXP_REPLACE(Bikes.timestamp_field_24, "\\+.*",""), "\\..*", " "), "Z", " "), "T", " ")) < "2021-11-24"
Execution Details:
As this is a one time task, I'm happy for suggestions on whacky workarounds.