2

I need to create a table (postgresql 9.1) and I am stuck. Could you possibly help?

The incoming data can assume either of the two formats:

  1. client id(int), shop id(int), asof(date), quantity
  2. client id(int), , asof(date), quantity

The given incoming CSV template is: {client id, shop id, shop type, shop genre, asof, quantity}

In the first case, the key is -- client id, shop id, asof

In the second case, the key is -- client id, shop type, shop genre, asof

I tried something like:

    create table(
            client_id       int references...,
            shop_id         int references...,
            shop_type       int references...,
            shop_genre      varchar(30),
            asof            date,
            quantity        real,
            primary key( client_id, shop_id, shop_type, shop_genre, asof )
    );

But then I ran into a problem. When data is of format 1, the inserts fail because of nulls in pk.

The queries within a client can be either by shop id, or by a combination of shop type and genre. There are no use cases of partial or regex matches on genre.

What would be a suitable design? Must I split this into 2 tables and then take a union of search results? Or, is it customary to put 0's and blanks for missing values and move along?

If it matters, the table is expected to be 100-500 million rows once all historic data is loaded.

Thanks.

Dinesh
  • 4,437
  • 5
  • 40
  • 77

2 Answers2

3

You could try partial unique indexes aka filtered unique index aka conditional unique indexes. http://www.postgresql.org/docs/9.2/static/indexes-partial.html

Basically what it comes down to is the uniqueness is filtered based on a where clause,

For example(Of course test for correctness and impact on performance):

CREATE TABLE client(
            pk_id           SERIAL,
            client_id       int,
            shop_id         int,
            shop_type       int,
            shop_genre      varchar(30),
            asof            date,
            quantity        real,
            PRIMARY KEY (pk_id)
    );


  CREATE UNIQUE INDEX uidx1_client
  ON client
  USING btree
  (client_id, shop_id, asof, quantity)
  WHERE client_id = 200;

  CREATE UNIQUE INDEX uidx2_client
  ON client
  USING btree
  (client_id, asof, quantity)
  WHERE client_id = 500;
Kuberchaun
  • 29,160
  • 7
  • 51
  • 59
1

A simple solution would be to create a field for the primary key which would use one of two algorithms to generate its data depending on what is passed in.

If you wanted a fully normalised solution, you would probably need to split the shop information into two separate tables and have it referenced from this table using outer joins.

You may also be able to use table inheritance available in postgres.

drone.ah
  • 1,135
  • 14
  • 28
  • This is an option I have seen and works. Basically you take the same approach I did with partial unique indexes, but use a trigger to enforce the where logic. See http://stackoverflow.com/questions/8250389/computed-calculated-columns-in-postgresql If the generated primary key gets ugly looking you can consider taking it and hashing it using some algorithm that meets the needs before stuffing it into the column. Still looks kinda ugly,but doesn't look like a Frankenstein at least. – Kuberchaun Jan 24 '13 at 15:33
  • I can see how to use this approach too but I like the idea of a single auto-gen PK as it clicks better with the rest of my tables. Separately, is this a scenario where one uses inheritance? Like, a table forecasts with {client_id, asof} and then two tables, one forecast_by_shop_id (shop_id) inherits(forecasts), and another as forecast_by_genre(shop_type, shop_genre) inherits(forecasts)? I have not used inheritance feature in databases before. Thanks. – Dinesh Jan 25 '13 at 02:22
  • I have never used table inheritance. However, I think that it can provide you a solution which give you the two different sets of primary keys and prevent the creation of data that does not adhere to the two requirements. – drone.ah Jan 25 '13 at 07:24