4

I have a PostgreSQL database with some schemas, like below:

My_Database
 |-> Schemas
    |-> AccountA
    |-> AccountB
    |-> AccountC
    |-> AccountD
    |-> AccountE
           .
           .
           .
    |-> AccountZ

All schemas have a table called product which has a column called title. I would like to know if is possible to execute a select statement to retrieve all records from all schemas with a certain conditional.

The only way I found until now is to run a query account by account, like below.

SET search_path TO AccountA;

SELECT title FROM product WHERE title ILIKE '%test%';

Schemas are created dynamically, so I don't know their names or how many of them exist.

Marcio Mazzucato
  • 8,841
  • 9
  • 64
  • 79
  • 2
    Without using system tables to get a list of schema's having this table and then using dynamic SQL to generate a union between all the queries. I don't think it can be done... and it certainly wouldn't be fast. – xQbert Dec 13 '13 at 20:44
  • You will need to use dynamic SQL to complete this, and it won't be an easy one to do. This schema is horrible, any will to normalize this? One table with all accounts is far better than a schema per account...to the point where you seem pretty crippled by this setup. – Twelfth Dec 13 '13 at 20:47
  • Note that a schema is a database, not a table. You have one scheme (My Database), with lots of tables. – elixenide Dec 13 '13 at 20:50
  • 1
    @Twelfth: it's not hard, and his design is not horrible. Some industries can't live with the idea of having their data live near their competitor's, and this schema suits that scenario quite fine. Plus, it makes joins a heck of a lot faster for more typical "local" queries than when you've a table shared by all app instances that require you to cart a site_id around everywhere. – Denis de Bernardy Dec 13 '13 at 20:53
  • @Denis, Great! Our customers have exactly this idea, they don't like to know that your data is so close of a possible competidor. And i think this way the things become easy to manage. – Marcio Mazzucato Dec 13 '13 at 20:56
  • 1
    Not just that: it also makes it easier to migrate and upgrade. You can then have multiple versions of the same app running concurrently from the same database, while they get migrated one by one. There are a few cockroaches along the way, though; in particular in the area of compiling queries that span all schemas. If things being perfectly up to date aren't critical, you really ought to maintain a few materialized views on key fields from those, to make your life easier. – Denis de Bernardy Dec 13 '13 at 21:03
  • @Denis - from a purely reporting standpoint (such as a list of all products from all schema's), this is horrid to deal with. In a single table, this would be a very simple select statement to get all products. In this setup, every query that wants to view something from all schema's will need to be written as a cursor that is loaded by a query on the system tables to get all the dynamically named schema's and then loop through that building up a giant union statement. Perhaps there are operational reasons to this setup, but reporting from more than one account is now a looping cursor. – Twelfth Dec 13 '13 at 21:08
  • And I agree entirely on the reporting aspects — they suck short of having a few astute materialized views or similar to quickly yield relevant data. But for day to day operation, sharding data this way is superior than a huge table for multi-tenant applications in a number of aspects. Namely performance, permissions, migration, sharding and — perhaps most importantly in OP's case, since it's a B2B product — sales. The ability to tell "your data lives in its own schema" to an enterprise prospect, irrespective of how bizarre it might be from a normalization standpoint, is occasionally key. – Denis de Bernardy Dec 13 '13 at 21:29
  • Might not be the worst idea to create a data warehouse and an ETL process to collect the data across all schemas and insert it into a normalized table. Would create a reporting environment and maintain this setup for the business. If the OP can control the table create statements, then Erwin's answer below looks ideal. – Twelfth Dec 13 '13 at 21:38
  • 2
    @xQbert: Can't be done? Challenge accepted! – Erwin Brandstetter Dec 14 '13 at 01:14
  • @ErwinBrandstetter yep I stand behind my comment. Based on the question the Author seems to indicate he doesn't have control over the table design (doesn't know names of schema's or how many exist as it's dynamic) without this, OOP solution doesn't work and dynamic SQL and union are the only options *I* can think of – xQbert Dec 14 '13 at 01:44

2 Answers2

9

With inheritance like @Denis mentioned, this would be very simple. Works for Postgres 8.4, too. Be sure to consider the limitations.

Basically, you would have a master table, I suppose in a master schema:

CREATE TABLE master.product (title text);

And all other tables in various schemata inherit from it, possibly adding more local columns:

CREATE TABLE a.product (product_id serial PRIMARY KEY, col2 text)
INHERITS (master.product);

CREATE TABLE b.product (product_id serial PRIMARY KEY, col2 text, col3 text)
INHERITS (master.product);

etc.

Tables don't have to share the same name or schema.
Then you can query all tables in a single fell swoop:

SELECT title, tableoid::regclass::text AS source
FROM   master.product
WHERE  title ILIKE '%test%';

tableoid::regclass::text is a handy way to tell the source of each row. But it interacts with the search_path. See:

fiddle
Old sqlfiddle

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • sure change the context of the question to suit an answer and things work. Using existing design presented... not so much... I stand behind my original comment. without dynamic sql an unions though an indeterminate number of product tables this isn't possible. Change the design to use OOP and inheritance, or use partitioning in other dbs (I think postgresql does this though inheritance) and yeah it can be done. Perhaps I built a box where there wasn't one, perhaps you destroyed a box where one existed. Only the Author can tell us. – xQbert Dec 14 '13 at 01:42
  • Erwin, nice suggestion! In your opinion, would be better to use inheritance or dynamic SQL? My main point is to create a database easy to manage and with a good performance. – Marcio Mazzucato Dec 16 '13 at 18:20
  • 1
    @MarcioSimao: There is nothing in the question that would conflict with inheritance (which has some limitations, read the manual!). As long as that is not the case, my suggestion is clearly simpler, faster and safer than dynamic SQL, which is more flexible. – Erwin Brandstetter Dec 16 '13 at 18:35
  • @ErwinBrandstetter, I am your fan! You are a PostgreSQL master! I will use your solution in my next project. – Marcio Mazzucato Dec 16 '13 at 20:27
  • @ErwinBrandstetter, Erwin, i am using this solution and now i need to put the statement `tableoid::regclass::text` in some `JOIN` clauses. Do you think i will have performance troubles? Thanks! – Marcio Mazzucato Mar 18 '16 at 12:40
  • 1
    @MarcioMazzucato: Try it. Test with `EXPLAIN ANALYZE`. Post another question if you run into problems. You can always link to this answer for context. – Erwin Brandstetter Mar 18 '16 at 17:48
2

You basically want a union all:

SELECT title FROM AccountA.product WHERE title ILIKE '%test%'
UNION ALL
SELECT title FROM AccountB.product WHERE title ILIKE '%test%'
UNION ALL
...;

You can do so automatically by using dynamic SQL and the catalog to locate all AccountXYZ schemas that have a products table.

Alternatively, create a AllAccounts schema with similar tables as the ones in individual schemas, and use table inheritance.

Note that neither will tell you which schema the data is from, however. In the former case, this is easy enough to add; not so much in the latter unless you add an extra column.

Denis de Bernardy
  • 75,850
  • 13
  • 131
  • 154