Author of Kiba here!
This is a common requirement, but it can (and this is not specific to Kiba) be more or less complex to handle. Here are a few points you'll need to think about.
Handling of foreign keys
The main problem here is that you'll want to keep the relationships between services and bookings, once they are inserted.
Foreign keys using business keys
A first (most easy) way to handle this is to use a foreign-key constraint on "booking number", and make sure to insert that booking number in each service row, so that you can leverage it later in your queries. If you do this (see https://stackoverflow.com/a/18435114/20302) you'll have to set a unique-constraint on "booking number" in the bookings table target.
Foreign keys using primary keys
If you instead prefer to have a booking_id
which points to the bookings
table id
key, things are a bit more complicated.
If this is a one-off import targeting an empty table, I recommend that you arbitrarily force the primary key using something like:
transform do |r|
@row_index ||= 0
@row_index += 1
r.merge(id: @row_index)
end
If this not a one-off import, you will have to:
* Upsert bookings in a first pass
* In a second pass, look-up (via SQL queries) "bookings" to figure out what is the id
to store in booking_id
, then upsert the services
As you see it's a bit more work, so stick with option 1 if you don't have strong requirements around this (although option 2 is more solid on the long run).
Example implementation (using Kiba Pro & business keys)
The simplest way to achieve this (assuming your target is Postgres) is to use Kiba Pro's SQL Bulk Insert/Upsert destination.
It would go this way (in single pass):
extend Kiba::DSLExtensions::Config
config :kiba, runner: Kiba::StreamingRunner
source Kiba::Common::Sources::Enumerable, -> { Dir["input/*.json"] }
transform { |r| JSON.parse(IO.read(r)).fetch('bookings') }
transform Kiba::Common::Transforms::EnumerableExploder
# SNIP (remapping / renaming of fields etc)
first_destination = nil
destination Kiba::Pro::Destinations::SQLBulkInsert,
row_pre_processor: -> (row) { row.except("services") },
dataset: -> (dataset) {
dataset.insert_conflict(target: :booking_number)
},
after_read: -> (d) { first_destination = d }
destination Kiba::Pro::Destinations::SQLBulkInsert,
row_pre_processor: -> (row) { row.fetch("services") },
dataset: -> (dataset) {
dataset.insert_conflict(target: :service_number)
},
before_flush: -> { first_destination.flush }
Here we iterate over each input file, parsing it and grabbing the "bookings", then generating one row per element of "bookings".
We have 2 destinations, doing "upsert" (insert or update), plus one trick to ensure we'll save the parent rows before we insert the children, to avoid a failure due to missing pointed record.
You can of course implement this yourself, but this is a bit of work though!
If you need to use primary-key based foreign keys, you'll have (likely) to split in 2 pass (one for each destination), then add some form of lookup in the middle.
Conclusion
I know that this is not trivial (depending on what you'll need, & if you'll use Kiba Pro or not), but at least I'm sharing the patterns that I'm using in such situations.
Hope it helps a bit!