0

Suppose I have this piece of data (in reality roughly 5,000 entries every few seconds, initially 50,000 from a data dump):

[
  "person": {
    "name": "Johnny Bravo",
    "location": {
      "zipcode": 11111,
      "address": "1 Second Ave"
    },
    "phone_numbers": [ 15007774321, 12227435432 ]
  }
]

and this schema:

CREATE TABLE person(
  id serial PRIMARY KEY,
  name varchar(255) UNIQUE,
  location_id integer NOT NULL REFERENCES location(id)
)
CREATE TABLE location(
  id serial PRIMARY KEY,
  zipcode integer NOT NULL,
  address varchar(255) NOT NULL
)
CREATE TABLE phone_number(
  id serial PRIMARY KEY,
  "number" integer NOT NULL
)
CREATE TABLE person_number_relationship(
  id serial PRIMARY KEY,
  phone_number_id integer NOT NULL REFERENCES phone_number(id),
  person_id integer NOT NULL REFERENCES person(id)
)

This is an oversimplification but the parts that matter are here. Currently to insert this data I use the language accessing the database (scala) to do most of the work.

  1. Insert all locations and then select all locations (2 queries)
  2. Replace location with location id in persons list
  3. Insert phone numbers and then select all phone numbers (2 queries)
  4. Create a map of phone number's number to id
  5. Insert persons list and then select persons id and name (2 queries)
  6. Create a map of person's name to id
  7. Create a new list of person_id, phone_number_id from the phone_numbers list and the persons map
  8. Insert the new list (1 query)

The question I am asking can be broken into two parts:

  1. How can I insert into two tables where one table requires a generated id from the other?
  2. How can I insert a many to many relationship for two tables when the ids are generated on insert?

Ideally I would like to push all of this logic down to the database where I wont be wasting memory sending extra information between the driver and the database

andykais
  • 996
  • 2
  • 10
  • 27
  • 1
    See [here](https://stackoverflow.com/q/21386772/330315) or [here](https://stackoverflow.com/q/15627781/330315) or [here](https://stackoverflow.com/q/35906925/330315) –  Mar 14 '18 at 19:21
  • those answers work if none of the rows exist already and using `on conflict do update` for everything apparently has its [problems](https://stackoverflow.com/questions/34708509/how-to-use-returning-with-on-conflict-in-postgresql) – andykais Mar 14 '18 at 19:24

0 Answers0