Suppose I have the following tables, in an Oracle DB
Foo:
+--------+---------+---------+
| id_foo | string1 | string2 |
+--------+---------+---------+
| 1 | foo | bar |
| 2 | baz | bat |
+--------+---------+---------+
Bar:
+--------+-----------+--------+
| id_bar | id_foo_fk | string |
+--------+-----------+--------+
| 1 | 1 | boo |
| 2 | 1 | bum |
+--------+-----------+--------+
When I insert into Foo, by using a Dataset and JDBC, such as
Dataset<Row> fooDataset = //Dataset is initialized
fooDataset.write().mode(SaveMode.Append).jdbc(url, table, properties)
an ID is auto-generated by the database. Now when I need to save Bar
, using the same strategy, I want to be able to link it to Foo
, via id_foo_fk
.
I looked into some possibilities, such as using monotonically_increasing_id()
as suggested in this question, but it won't solve the issue, as I need the ID generated by the database. I tried what was suggested in this question, but it leads to the same issue, of unique non-database IDs
It's also not possible to select from the JDBC again, as string1
and string2
may not be unique. Nor is it possible to change the database. For instance, I can't change it to be UUID, and I can't add a trigger for it. It's a legacy database that we can only use
How can I achieve this? Is this possible with Apache Spark?