How to clone a postgreSQL database with partial data

Question

I'm facing a task that clone a PostgreSQL database, keep all constraints, indexes, etc., and including all records that related to specific column value.

In other word, it's a separate big database to multiple smaller databases.

For example, my original database has numerous schemas, in each schema has numerous tables and in each table has records about multiple person. I want to clone it to new database but clone only records that related to specific person with person id (clone all records in all tables that have person_id = xxx).

Is there a tool for this task or any suggestions? (I'm familiar with Java and Python)

I don't think this is really a python problem. I would start by dumping out the schema (no data) and apply that to your new db https://stackoverflow.com/a/14486505/3280538 . After that you'll need to write queries to dump the rows you care about from specific tables. For each table you can import the values into your new database which you applied the schema to before. https://stackoverflow.com/a/12824831/3280538 — flakes, Aug 12 '20 at 01:32

Z4-tier · Accepted Answer · 2020-08-12T04:24:01.637

The best way I have found to do this is to first export the complete schema using the pg_dump tool with the -s flag (to dump schema only, and not data), and then export data separately.

To load your schemas starting from a fresh, empty database, use pg_restore. It will read the output from pg_dump and use it to build a database.

When exporting the data, you'll need to classify each table (you can write a helper script to make this easier, or use excel, etc...):

Tables that you want a subset of data from, based on some condition (ie. a certain value in person_id)
Tables you want to copy in their entirety (like dimension tables, such as calendar and company_locations)
Tables that you don't need any data from

For (1), you will need to write an appropriate SELECT query that returns the subset of data that you want to copy. Put those queries in a script and have each one write the result to a separate file, named <schema>.<table>. Lastly, use the psql utility to load the data to the test database. psql has a special \copy command that makes this easy. It can be used from a terminal like this:

psql --c "\copy schema.table FROM ‘~/dump_data/schema.table’ WITH DELIMITER ‘,’ CSV;"

Use pg_dump again to take care of all of those falling under (2), using the -t <table name> flag to dump only the named tables, and -a to dump only data (no schema). This could also be added to the script for (1) by just adding an unqualified SELECT * for each table in (2) and loading the data the same way.

Tables falling under (3) were already handled by the initial export.

How to clone a postgreSQL database with partial data

1 Answers1