I have a table A
with the following columns:
- id
UUID
- str_identifier
TEXT
- num
FLOAT
and a table B
with similar columns:
- str_identifier
TEXT
- num
FLOAT
- entry_date
TIMESTAMP
I want to construct a sqlalchemy query that does the following:
- finds entries in table
B
that either do not exist yet in tableA
, and inserts them - finds entries in table
B
that do exist in tableA
but have a different value for thenum
column
The catch is that table B
has the entry_date
column, and as a result can have multiple entries with the same str_identifier
but different entry dates. So I always want to perform this insert/update query using the latest entry for a given str_identifier
(if it has multiple entries in table B
).
For example, if before the query runs tables A
and B
are:
[A]
| id | str_identifier | num |
|-----|-----------------|-------|
| 1 | str_id_1 | 25 |
[B]
| str_identifier | num | entry_date |
|----------------|-----|------------|
| str_id_1 | 89 | 2020-07-20 |
| str_id_1 | 25 | 2020-06-20 |
| str_id_1 | 50 | 2020-05-20 |
| str_id_2 | 45 | 2020-05-20 |
After the update query, table A
should look like:
[A]
| id | str_identifier | num |
|-----|-----------------|-----|
| 1 | str_id_1 | 89 |
| 2 | str_id_2 | 45 |
The query I've constructed so far should detect difference, but will adding order_by(B.entry_date.desc())
ensure I only do the exist
comparisons with the latest str_identifier
values?
My Current Query
query = (
select([B.str_identifier, B.value])
.select_from(
join(B, A, onclause=B.str_identifier == A.str_identifier, isouter=True)
)
.where(
and_(
~exists().where(
and_(
B.str_identifier == A.str_identifier,
B.value == A.value,
~B.value.in_([None]),
)
)
)
)
)