Suppose I have this piece of data (in reality roughly 5,000 entries every few seconds, initially 50,000 from a data dump):
[
"person": {
"name": "Johnny Bravo",
"location": {
"zipcode": 11111,
"address": "1 Second Ave"
},
"phone_numbers": [ 15007774321, 12227435432 ]
}
]
and this schema:
CREATE TABLE person(
id serial PRIMARY KEY,
name varchar(255) UNIQUE,
location_id integer NOT NULL REFERENCES location(id)
)
CREATE TABLE location(
id serial PRIMARY KEY,
zipcode integer NOT NULL,
address varchar(255) NOT NULL
)
CREATE TABLE phone_number(
id serial PRIMARY KEY,
"number" integer NOT NULL
)
CREATE TABLE person_number_relationship(
id serial PRIMARY KEY,
phone_number_id integer NOT NULL REFERENCES phone_number(id),
person_id integer NOT NULL REFERENCES person(id)
)
This is an oversimplification but the parts that matter are here. Currently to insert this data I use the language accessing the database (scala) to do most of the work.
- Insert all locations and then select all locations (2 queries)
- Replace location with location id in persons list
- Insert phone numbers and then select all phone numbers (2 queries)
- Create a map of phone number's number to id
- Insert persons list and then select persons id and name (2 queries)
- Create a map of person's name to id
- Create a new list of person_id, phone_number_id from the phone_numbers list and the persons map
- Insert the new list (1 query)
The question I am asking can be broken into two parts:
- How can I insert into two tables where one table requires a generated id from the other?
- How can I insert a many to many relationship for two tables when the ids are generated on insert?
Ideally I would like to push all of this logic down to the database where I wont be wasting memory sending extra information between the driver and the database