I'm looking to JOIN
data from a change-log style table of users into a table of events which which have matching ids
The tables are as follows:
project_events
schema
timestamp TIMESTAMP
event_id STRING
user_id STRING
data STRING
example data
| timestamp | event_id | user_id | data |
|-----------------------------|-----------|---------|------------------|
| 2020-08-22 17:01:18.807 UTC | hHZuTE8Y= | ABC123 | {"some":"json" } |
| 2020-08-20 16:57:28.022 UTC | tF5Gky8Q= | ZXY432 | {"foo":"item" } |
| 2020-08-15 16:44:25.607 UTC | 1dOU8pOo= | ABC123 | {"bar":"val" } |
users_changelog
schema
timestamp TIMESTAMP
event_id STRING
operation STRING
user_id STRING
data STRING
example data
| timestamp | event_id | operation | user_id | data |
|-----------------------------|-----------|-----------|---------|---------------------|
| 2020-08-30 12:50:59.036 UTC | mGdNKy+o= | DELETE | ABC123 | {"name":"removed" } |
| 2020-08-20 16:50:59.036 UTC | mGdNKy+o= | UPDATE | ABC123 | {"name":"final" } |
| 2020-08-05 20:45:36.936 UTC | mIICo9LY= | UPDATE | ZXY432 | {"name":"asdf" } |
| 2020-08-05 20:45:21.023 UTC | nEDKyCks= | UPDATE | ABC123 | {"name":"other" } |
| 2020-08-03 12:40:49.036 UTC | GxnbUqQ0= | CREATE | ABC123 | {"name":"initial" } |
| 1970-01-01 00:00:00 UTC | 1y+6fVWo= | IMPORT | ZXY432 | {"name":"test" } |
NOTE: operation can be either "CREATE", "UPDATE", "DELETE" or "IMPORT". Since users can be updated multiple times there can be multiple rows with the same user_id
The goal is to display the event_id and data column associated with the most recent operation in the user's table for the matching ID. Using example data, the intended result would be:
| event_id | event_data | user_id | user_data |
|-----------|------------------|---------|-------------------|
| hHZuTE8Y= | {"some":"json" } | ABC123 | {"name":"final" } |
| tF5Gky8Q= | {"foo":"item" } | ZXY432 | {"name":"asdf" } |
| 1dOU8pOo= | {"bar":"val" } | ABC123 | {"name":"other" } |
I have tried the following but it produces duplicate rows (one for each row in changelog table with matching id):
SELECT
events.event_id as event_id,
events.data as event_data,
users.user_id as user_id,
users.data as user_data
FROM my_project.my_dataset.project_events as events
LEFT JOIN my_project.my_dataset.users_changelog as users
ON events.user_id = users.user_id AND users.timestamp <= events.timestamp