1

I have a Rails 3.2 multi-tenant subdomain based app which I'm trying to migrate over to PostgreSQL's schemas (each account getting its own schema -- right now all of the accounts use the same tables).

So, I'm thinking I need to:

  1. Create a new DB
  2. Create a schema for each Account (its id) and the tables under them
  3. Grab all the data that belongs to each account and insert it into the new DB under the schema of said account

Does that sound correct? If so, what's a good way of doing that? Should I write a ruby script that uses ActiveRecord, plucks the data, then inserts it (pretty inefficient, but should get the job done) into the new DB? Or does Postgres provide good tools for doing such a thing?

EDIT:

As Craig recommended, I created schemas in the existing DB. I then looped through all of the Accounts in a Rake task, copying the data over with something like:

Account.all.each do |account|

  PgTools.set_search_path account.id, false
  sql = %{INSERT INTO tags SELECT DISTINCT "tags".* FROM "tags" INNER JOIN "taggings" ON "tags"."id" = "taggings"."tag_id" WHERE "taggings"."tagger_id" = #{admin.id} AND "taggings"."tagger_type" = 'User'}
  ActiveRecord::Base.connection.execute sql

  #more such commands

end
Hauleth
  • 22,873
  • 4
  • 61
  • 112
kmurph79
  • 1,192
  • 3
  • 13
  • 28
  • 2
    What PostgreSQL version are you working with? It helps to know as different Pg versions have different capabilities. As for conversion, I'd do the conversion with SQL personally, by creating the new schemas in the *same* database as the current one, migrating the data using appropriate `INSERT INTO ... SELECT` queries, then eventually dropping or archiving the old schema. – Craig Ringer Aug 07 '12 at 08:29
  • @CraigRinger thanks, I'll try that. I'm on the latest 9.1.4 – kmurph79 Aug 07 '12 at 22:49
  • 1
    OK. Moved to answer and elaborated. – Craig Ringer Aug 08 '12 at 00:11

1 Answers1

3

I'd do the conversion with SQL personally.

Create the new schemas in the same database as the current one for easy migration, because you can't easily query across databases with PostgreSQL.

Migrate the data using appropriate INSERT INTO ... SELECT queries. To do it without having to disable any foreign keys, you should build a dependency graph of your data. Copy the data into tables that depend on nothing first, then tables that depend on them, and so on.

You'll need to repeat this for each customer schema, so consider creating a PL/PgSQL function that uses EXECUTE ... dynamic SQL to:

  • Create the schema for a customer
  • Create the tables within the schema
  • Copy data in the correct order by looping over a hard-coded array of table names, doing:

    EXECUTE `'INSERT INTO '||quote_ident(newschema)||'.'||quote_ident(tablename)||' SELECT * FROM oldschema.'||quote_ident(tablename)||' WHERE customer_id = '||quote_literal(customer_id)'||;'
    

    where newschema, tablename and customer_id are PL/PgSQL variables.

You can then invoke that function from SQL. While you could do just select convert_customer(c.id) FROM customer GROUP BY c.id, I'd probably do it from an external control script just so each customer's work got done and committed individually, avoiding the need to start again from scratch if the second-to-last customer conversion fails.

For bonus crazy points it's even possible to define triggers on the main customer schema's tables that replicate changes to already-migrated customers over to the copy of their data in the new schema, so they can keep using the system during the migration. I'd avoid that unless the migration was just too big to do without downtime, as it'd be a nightmare to test and you'd still need the triggers to throw an error on access by customer id x while the migration of x's data was actually in-progress, so it wouldn't be fully transparent.

If you're using different login users for different customers (strongly recommended) your function can also:

  • REVOKE rights on the schema from public
  • GRANT limited rights on the schema to the user(s) or role(s) who'll be using it
  • REVOKE rights on public from each table created
  • GRANT the desired limited rights on each table to the user(s) and role(s)
  • GRANT on any sequences used by those tables. This is required even if the sequence is created by a SERIAL pseudo-column.

That way all your permissions are consistent and you don't need to go and change them later. Remember that your webapp should never log in as a superuser.

Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
  • Hey Craig. Thanks for response. After finally deploying this to production, one issue I've run into is that the new tables start their auto incremented primary key sequences at 1 (obviously). This very occasionally causes conflicts where it tries to insert a record with an ID that already exists (as a record had been copied over with that ID). Any ideas for resolutions to this issue? – kmurph79 Sep 26 '12 at 23:13
  • 1
    @kmurph79 It's usually a bad idea to insert non-generated keys into a PK column managed with `serial` and `nextval`. Assign a new key when you merge, keeping the old key in a nullable `old_key` column if you have to. If you can't do that for whatever business reasons, post a new question with a more detailed explanation of the issue and link it here. Options include uuid primary keys (ick), offset `increment by` sequences that'll always generate non-conflicting IDs, and `LOCK`ing the table before a merge then `setval()`ing the table's sequence to `max(id)+1` before releasing the lock. – Craig Ringer Sep 26 '12 at 23:36
  • Thanks again Craig. The last option, setval()ing with max(id) +1, seems the most straightforward. The answer to [this question](http://stackoverflow.com/questions/244243/how-to-reset-postgres-primary-key-sequence-when-it-falls-out-of-sync) is satisfactory, no? Could this incur any performance issues? – kmurph79 Sep 28 '12 at 15:43
  • 1
    @kmurph79 Yep, in particular http://stackoverflow.com/a/3698777/398670 and http://stackoverflow.com/a/244265/398670, but **remember to `LOCK TABLE`** first, because otherwise someone might insert a row between the `select(max)` and the `setval`'s execution even if they're in the same statement. – Craig Ringer Sep 28 '12 at 23:24