0

I am currently working on a Postgres database with data for car tracking which looks similar to this:

+----+--------+------------+----------+
| id | car_id |    date    |   time   |
+----+--------+------------+----------+
| 11 |      1 | 2014-12-20 | 12:12:12 |
| 12 |      1 | 2014-12-20 | 12:12:13 |
| 13 |      1 | 2014-12-20 | 12:12:14 |
| 23 |      1 | 2015-12-20 | 23:42:10 |
| 24 |      1 | 2015-12-20 | 23:42:11 |
| 31 |      2 | 2014-12-20 | 15:12:12 |
| 32 |      2 | 2014-12-20 | 15:12:14 |
+----+--------+------------+----------+

Here is the setup:

CREATE TABLE test (
    id int
    , car_id int
    , date text
    , time text
);
INSERT INTO test VALUES
    (11, 1, '2014-12-20', '12:12:12'),
    (12, 1, '2014-12-20', '12:12:13'),
    (13, 1, '2014-12-20', '12:12:14'),
    (23, 1, '2015-12-20', '23:42:10'),
    (24, 1, '2015-12-20', '23:42:11'),
    (31, 2, '2014-12-20', '15:12:12'),
    (32, 2, '2014-12-20', '15:12:14');

I want to create a column where the traces are assigned a trip number sorted by id

id   car_id    date          time       (trip)
11   1         2014-12-20    12:12:12   1
12   1         2014-12-20    12:12:13   1
13   1         2014-12-20    12:12:14   1
23   1         2015-12-20    23:42:10   2   (trip +1 because time difference is bigger then 5 sec)
24   1         2015-12-20    23:42:11   2
31   2         2014-12-20    15:12:12   3   (trip +1 because car id is different)
32   2         2014-12-20    15:12:14   3          `

I have put op following rules

  • first row (lowest id) gets the value trip = 1

  • for the following rows: if car_id is equal to the row above and time difference between the row and the row above is smaller then 5 then trip is the same as the row above, else trip is the row above +1

I have tried with the following

Create table test as select
"id", "date", "time", car_id,

extract(epoch from "date" + "time") - lag(extract(epoch from "date" + "time")) over (order by "id") as diff,

Case
when t_diff < 5 and car_id - lag(car_id) over (order by "id") = 0
then lag(trip) over (order by "id")
else lag(trip) over (order by "id") + 1
end as trip

From road_1 order by "id"

but it does not work :( How can I compute the trip column?

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677

1 Answers1

0

First, use (date || ' ' || time)::timestamp AS datetime to form a timestamp out of date and time

SELECT id, test.car_id 
       , (date || ' ' || time)::timestamp AS datetime
FROM test 

which yields

| id | car_id | datetime            |
|----+--------+---------------------|
| 11 |      1 | 2014-12-20 12:12:12 |
| 12 |      1 | 2014-12-20 12:12:13 |
| 13 |      1 | 2014-12-20 12:12:14 |
| 23 |      1 | 2015-12-20 23:42:10 |
| 24 |      1 | 2015-12-20 23:42:11 |
| 31 |      2 | 2014-12-20 15:12:12 |
| 32 |      2 | 2014-12-20 15:12:14 |

It is helpful to do this since we'll be using datetime - prev > '5 seconds'::interval to identify rows which are 5 seconds apart. Notice that 2014-12-20 23:59:59 and 2014-12-21 00:00:00 are 5 seconds apart but it would be difficult/tedious to determine this if all we had were separate date and time columns.

Now we can express the rule that the trip is increased by 1 when

NOT ((car_id = prev_car_id) AND (datetime-prev_date <= '5 seconds'::interval))

(More on why the condition is expressed in this seemingly backwards way, below).

SELECT id, car_id, prev_car_id, datetime, prev_date
    , (CASE WHEN ((car_id = prev_car_id) AND (datetime-prev_date <= '5 seconds'::interval)) THEN 0 ELSE 1 END) AS new_trip
FROM ( 
    SELECT id, car_id, datetime
        , lag(datetime) OVER () AS prev_date
        , lag(car_id) OVER () AS prev_car_id
    FROM (
        SELECT id, car_id 
               , (date || ' ' || time)::timestamp AS datetime
        FROM test ) t1
    ) t2

yields

| id | car_id | prev_car_id | datetime            | prev_date           | new_trip |
|----+--------+-------------+---------------------+---------------------+----------|
| 11 |      1 |             | 2014-12-20 12:12:12 |                     |        1 |
| 12 |      1 |           1 | 2014-12-20 12:12:13 | 2014-12-20 12:12:12 |        0 |
| 13 |      1 |           1 | 2014-12-20 12:12:14 | 2014-12-20 12:12:13 |        0 |
| 23 |      1 |           1 | 2015-12-20 23:42:10 | 2014-12-20 12:12:14 |        1 |
| 24 |      1 |           1 | 2015-12-20 23:42:11 | 2015-12-20 23:42:10 |        0 |
| 31 |      2 |           1 | 2014-12-20 15:12:12 | 2015-12-20 23:42:11 |        1 |
| 32 |      2 |           2 | 2014-12-20 15:12:14 | 2014-12-20 15:12:12 |        0 |

Now trip can be expressed as the cumulative sum over the new_trip column:

SELECT id, car_id, datetime, sum(new_trip) OVER (ORDER BY datetime) AS trip
FROM (
    SELECT id, car_id, prev_car_id, datetime, prev_date
        , (CASE WHEN ((car_id = prev_car_id) AND (datetime-prev_date <= '5 seconds'::interval)) THEN 0 ELSE 1 END) AS new_trip
    FROM ( 
        SELECT id, car_id, datetime
            , lag(datetime) OVER () AS prev_date
            , lag(car_id) OVER () AS prev_car_id
        FROM (
            SELECT id, car_id 
                   , (date || ' ' || time)::timestamp AS datetime
            FROM test ) t1
        ) t2

    ) t3

yields

| id | car_id | datetime            | trip |
|----+--------+---------------------+------|
| 11 |      1 | 2014-12-20 12:12:12 |    1 |
| 12 |      1 | 2014-12-20 12:12:13 |    1 |
| 13 |      1 | 2014-12-20 12:12:14 |    1 |
| 31 |      2 | 2014-12-20 15:12:12 |    2 |
| 32 |      2 | 2014-12-20 15:12:14 |    2 |
| 23 |      1 | 2015-12-20 23:42:10 |    3 |
| 24 |      1 | 2015-12-20 23:42:11 |    3 |

I used

(CASE WHEN ((car_id = prev_car_id) AND (datetime-prev_date <= '5 seconds'::interval)) THEN 0 ELSE 1 END)

instead of

(CASE WHEN ((car_id != prev_car_id) OR (datetime-prev_date > '5 seconds'::interval)) THEN 1 ELSE 0 END)

because prev_car_id and prev_date may be NULL. Thus, on the first row, (car_id != prev_car_id) returns NULL when instead we want TRUE. By expressing the condition in the opposite way, we can identify the unintersting rows correctly:

((car_id = prev_car_id) AND (datetime-prev_date <= '5 seconds'::interval))

and use the ELSE clause to return 1 when the condition is TRUE or NULL. You can see the difference here:

SELECT id  
    , (CASE WHEN ((car_id = prev_car_id) AND (datetime-prev_date <= '5 seconds'::interval)) THEN 0 ELSE 1 END) AS new_trip
    , (CASE WHEN ((car_id != prev_car_id) OR (datetime-prev_date > '5 seconds'::interval)) THEN 1 ELSE 0 END) AS new_trip_wrong
    , car_id, prev_car_id, datetime, prev_date
FROM ( 
    SELECT id, car_id, datetime
        , lag(datetime) OVER () AS prev_date
        , lag(car_id) OVER () AS prev_car_id
    FROM (
        SELECT id, car_id 
               , (date || ' ' || time)::timestamp AS datetime
        FROM test ) t1
    ) t2

yields

| id | new_trip | new_trip_wrong | car_id | prev_car_id | datetime            | prev_date           |
|----+----------+----------------+--------+-------------+---------------------+---------------------|
| 11 |        1 |              0 |      1 |             | 2014-12-20 12:12:12 |                     |
| 12 |        0 |              0 |      1 |           1 | 2014-12-20 12:12:13 | 2014-12-20 12:12:12 |
| 13 |        0 |              0 |      1 |           1 | 2014-12-20 12:12:14 | 2014-12-20 12:12:13 |
| 23 |        1 |              1 |      1 |           1 | 2015-12-20 23:42:10 | 2014-12-20 12:12:14 |
| 24 |        0 |              0 |      1 |           1 | 2015-12-20 23:42:11 | 2015-12-20 23:42:10 |
| 31 |        1 |              1 |      2 |           1 | 2014-12-20 15:12:12 | 2015-12-20 23:42:11 |
| 32 |        0 |              0 |      2 |           2 | 2014-12-20 15:12:14 | 2014-12-20 15:12:12 |

Note the difference in the new_trip versus new_trip_wrong columns.

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677