2

I have two tables in BigQuery, one is a journey_times table with a schema:

  1. journey_id (int)
  2. vehicle_id (string)
  3. start (timestamp yyyy-mm-dd hh:mm:ss)
  4. finish (timestamp yyyy-mm-dd hh:mm:ss)

Sample data:

- journey_id¦ vehicle¦ start¦ finish - 1¦ car1¦ 2017-12-05 01:33:44 UTC¦ 2017-12-05 01:53:14 UTC - 2¦ car2¦ 2017-12-05 03:04:18 UTC¦ 2017-12-05 03:28:49 UTC

and the other is a distance table with a schema of:

  1. vehicle_id (string)
  2. timestamp (timestamp yyyy-mm-dd hh:mm:ss)
  3. value (float)

Sample data:

- vehicle¦ timestamp¦ value - car3¦ 2016-08-30 17:36:52 UTC¦ 0.01635375 - car3¦ 2016-08-30 17:36:53 UTC¦ 0.02862375

What I want to do is query the distance table and find the maximum and minimum value for each of the journey rows in the journey_times table (with the same vehicle ID and between the start and finish timestamps) to end up with a table that can be joined onto the journey table and look like:

  1. journey_id
  2. max_distance_value
  3. min_distance_value

How would you write this in in BigQuery standard or legacy SQL?
My attempt below doesn't produce any results. `

WITH
  distance_table AS (
  SELECT
    vehicle,
    timestamp,
    value
  FROM
    'project.trip_distance' ),
  journey_table AS (
  SELECT
    journey_id,
    vehicle,
    start,
    finish        
  FROM
    'project.journey_times')
SELECT
  MIN(distance_table.value)
FROM
  distance_table JOIN journey_table
  using (vehicle)
WHERE
  distance_table.vehicle = journey_table.vehicle
  AND distance_table.timestamp BETWEEN journey_table.start
  AND journey_table.finish
Ralf Stubner
  • 26,263
  • 3
  • 40
  • 75
Mithras
  • 203
  • 3
  • 9
  • 1
    What have you tried so far? Please edit your question to show a [Minimal, Complete, and Verifiable example](http://stackoverflow.com/help/mcve) of the code that you are having problems with, then we can try to help with the specific problem. You can also read [How to Ask](http://stackoverflow.com/help/how-to-ask). Btw, answering questions like yours as it is now - considered quite a bad taste here on SO, so please consider updating it and showing some efforts to solve it before asking – Mikhail Berlyant Jun 04 '18 at 18:15
  • Sample data and desired results would really help. – Gordon Linoff Jun 04 '18 at 18:31
  • Apologies, @MikhailBerlyant , thank you for the links. I had (wrongly) thought my previous attempts were way off the mark and wouldn't be of much use. Did my best to format the sample values legibly. Gordon Linoff code is exactly what I was looking for, I had just over complicated the problem in my head. Is my submission still subpar? – Mithras Jun 04 '18 at 22:25

1 Answers1

3

If I understand correctly, this is a join and group by:

select j.journey_id, min(value), max(value)
from journey_times jt join
     vehicles v
     on jt.vehicle_id = v.vehicle_id and
        v.timestamp between jt.start and jt.finish
group by j.journey_id;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thank you for your submission even though I hadn't provided previous examples of code or sample data. I had over complicated the problem in my head and your submission was perfect. Had one attempt which was almost the same but it didn't work making me think I just didn't understand SQL. – Mithras Jun 04 '18 at 21:31