I have the following tables:
SELECT * FROM labels LIMIT 5;
user_id | session_id | start_timestamp | end_timestamp | travelmode
---------+------------+------------------------+------------------------+------------
11 | 0 | 2007-06-26 12:32:29+01 | 2007-06-26 12:40:29+01 | bus
11 | 0 | 2008-03-31 17:00:08+01 | 2008-03-31 17:09:01+01 | taxi
11 | 0 | 2008-04-01 01:48:32+01 | 2008-04-01 01:59:23+01 | taxi
11 | 0 | 2008-04-01 02:00:22+01 | 2008-04-01 02:08:13+01 | walk
11 | 0 | 2008-04-01 12:22:47+01 | 2008-04-01 12:28:39+01 | taxi
(5 rows)
SELECT * FROM trajectories LIMIT 5;
user_id | session_id | timestamp | lat | lon | alt
---------+-------------------+------------------------+-----------+------------+------
11 | 10020080330004134 | 2008-03-30 00:41:34+00 | 36.032647 | 103.850612 | -777
11 | 10020080330004134 | 2008-03-30 00:42:05+00 | 36.031563 | 103.851273 | -777
11 | 10020080330004134 | 2008-03-30 00:43:04+00 | 36.028623 | 103.853238 | -777
11 | 10020080330004134 | 2008-03-30 00:44:03+00 | 36.027323 | 103.854475 | -777
11 | 10020080330004134 | 2008-03-30 00:45:02+00 | 36.025775 | 103.854993 | -777
(5 rows)
So I want to update the session_id
column of the labels
table(initially all zero):
UPDATE labels
SET session_id=trajectories.session_id
FROM trajectories
WHERE
trajectories.user_id = labels.user_id
AND trajectories.timestamp >= labels.start_timestamp
AND trajectories.timestamp <= labels.end_timestamp;
UPDATE 4500
However, not all columns of the labels
table were updated (about 30% only) after about 5-minutes of query execution:
SELECT COUNT(*) FROM labels WHERE session_id=0;
count
-------
10217
(1 row)
If it might helps, more about the tables:
\d labels
Table "akil.labels"
Column | Type | Collation | Nullable | Default
-----------------+--------------------------+-----------+----------+---------
user_id | integer | | not null |
session_id | bigint | | |
start_timestamp | timestamp with time zone | | not null |
end_timestamp | timestamp with time zone | | not null |
travelmode | text | | |
Indexes:
"mode_pkey" PRIMARY KEY, btree (user_id, start_timestamp, end_timestamp)
\d trajectories
Table "akil.trajectories"
Column | Type | Collation | Nullable | Default
------------+--------------------------+-----------+----------+---------
user_id | integer | | |
session_id | bigint | | not null |
timestamp | timestamp with time zone | | not null |
lat | double precision | | not null |
lon | double precision | | not null |
alt | double precision | | |
Indexes:
"trajectory_pkey" PRIMARY KEY, btree (session_id, "timestamp", lat, lon)
EDIT
Adding index to trajectories
table:
CREATE INDEX idx_trecj ON trajectories (user_id, timestamp);
CREATE INDEX
UPDATE labels
SET session_id=trajectories.session_id
FROM trajectories
WHERE
trajectories.user_id = labels.user_id
AND trajectories.timestamp >= labels.start_timestamp
AND trajectories.timestamp <= labels.end_timestamp;
UPDATE 4500
SELECT COUNT(*) FROM labels WHERE session_id=0;
count
-------
10217
(1 row
Yet, not all session_id
in labels
are updated (same as the initial operation).
EDIT
Result of queries from the answer by @Erwin (this shows many none):
SELECT count(*) AS no_match_in_trajectories
FROM akil.labels l
WHERE NOT EXISTS (
SELECT FROM akil.trajectories t
WHERE t.user_id = l.user_id
AND t.timestamp >= l.start_timestamp
AND t.timestamp <= l.end_timestamp
);
no_match_in_trajectories
--------------------------
10217
(1 row)
In the second query (suggested way to do the update, none was updated);
UPDATE akil.labels l
SET session_id = t.session_id
FROM akil.labels l1
CROSS JOIN LATERAL (
SELECT t1.session_id
FROM akil.trajectories t1
WHERE t1.user_id = l1.user_id
AND t1.timestamp >= l1.start_timestamp
AND t1.timestamp <= l1.end_timestamp
ORDER BY t1.timestamp -- pick the earliest matching entry
LIMIT 1
) t
WHERE ( l.user_id, l.start_timestamp, l.end_timestamp) -- PK
= (l1.user_id, l1.start_timestamp, l1.end_timestamp)
AND l.session_id IS DISTINCT FROM t.session_id;
UPDATE 0