5

I have three tables that control products, colors and sizes. Products can have or not colors and sizes. Colors can or not have sizes.

product      color                           size
-------      -------                         -------
id           id                              id
unique_id    id_product (FK from product)    id_product (FK from version)
stock        unique_id                       id_version (FK from version)
title        stock                           unique_id
                                             stock

The unique_id column, that is present in all tables, is a serial type (autoincrement) and its counter is shared with the three tables, basically it works as a global unique ID between them.

It works fine, but i am trying to increase the query performance when i have to select some fields based in the unique_id.

As i don't know where is the unique_id that i am looking for, i am using UNION, like below:

select title, stock
from product 
where unique_id = 10

UNION

select p.title, c.stock
from color c
join product p on c.id_product = p.id
where c.unique_id = 10

UNION

select p.title, s.stock
from size s
join product p on s.id_product = p.id
where s.unique_id = 10;

Is there a better way to do this? Thanks for any suggestion!

EDIT 1

Based on @ErwinBrandstetter and @ErikE answers i decided to use the below query. The main reasons is:

1) As unique_id has indexes in all tables, i will get a good performance

2) Using the unique_id i will find the product code, so i can get all columns i need using a another simple join

SELECT 

    p.title,
    ps.stock

FROM (

    select id as id_product, stock
    from product 
    where unique_id = 10

    UNION

    select id_product, stock
    from color
    where unique_id = 10

    UNION

    select id_product, stock
    from size
    where unique_id = 10

) AS ps

JOIN product p ON ps.id_product = p.id;
Marcio Mazzucato
  • 8,841
  • 9
  • 64
  • 79
  • why not join all the tables with unique_id? – 03Usr Jul 10 '12 at 21:49
  • @03Usr, If the product have more than one color and at least one size, the query was returning a tuple for each color, not just one row, as i was expecting. – Marcio Mazzucato Jul 10 '12 at 21:59
  • I can't see where you used `unique_id` in the query. – ypercubeᵀᴹ Jul 10 '12 at 22:00
  • @ypercube, Thanks for advice, i've made an update in `where` clause. – Marcio Mazzucato Jul 10 '12 at 22:01
  • Please remove all non-specific databases from the tags unless it truly *is* being run on said specific database. It's one thing to say "should only use standard SQL" .. –  Jul 10 '12 at 22:05
  • 1) If you have indexes on unique_id, what's the performance problem? 2) When a unique_id is found in both Colors and Sizes, is the Title column from each table different or the same? What are you actually trying to accomplish with your query--just determining if the unique_id exists, or what? – ErikE Jul 10 '12 at 22:13
  • @ErikE, 1) I am improving my knowledge about performance in PostgreSQL, so do you think if i use indexes on unique_id i will get a satisfactory performance? 2) `unique_id` will not be repeated between the tables `product`, `color` and `size`, it works like a global unique id. 3) I am trying to find the `unique_id`, retrieve its stock and some columns on table `product`. – Marcio Mazzucato Jul 10 '12 at 22:24
  • Please see my answer in response to your question. About "unique_id will not be repeated between the tables product, color and size, it works like a global unique id." I don't understand... if a product can come in a color and a size, what does the uniqueid in the color table mean when that product has sizes? Can you order Product A in Color B with no size specified, even when sizes exist for Color B? – ErikE Jul 10 '12 at 22:34
  • Which version of PostgreSQL do you use? – Erwin Brandstetter Jul 10 '12 at 22:35
  • @ErwinBrandstetter, I am using PostgreSQL 8.1.23. I've requested to my host to update it, but they didn't do it until now. – Marcio Mazzucato Jul 10 '12 at 22:37
  • @MarcioSimao: Oh, that's a bummer. 8.1 is [**ancient**](http://www.postgresql.org/support/versioning/). I had just come up with an experimental solution involving CTEs, but that's not for you, then ... – Erwin Brandstetter Jul 10 '12 at 22:41
  • @ErwinBrandstetter, I am thinking about to change my host, it can be dangerous. Thanks for the tip! – Marcio Mazzucato Jul 10 '12 at 22:42
  • @ErwinBrandstetter, For example, if i am including a product called "Test" and it has two colors (white, black) and two sizes (15, 16). Considering that the last `unique_id` is 99, i will get this: Test(100), Test-White(101), Test-Black(102), Test-White-15-(103), Test-White-16(104), Test-Black-15(105), Test-Black-16(106). When `stock` is NULL i know that this product has colors or sizes. – Marcio Mazzucato Jul 10 '12 at 22:51

5 Answers5

5

PL/pgSQL function

To solve the problem at hand, a plpgsql function like the following should be faster:

CREATE OR REPLACE FUNCTION func(int)
  RETURNS TABLE (title text, stock int) LANGUAGE plpgsql AS
$BODY$
BEGIN

RETURN QUERY
SELECT p.title, p.stock
FROM   product p
WHERE  p.unique_id = $1; -- Put the most likely table first.

IF NOT FOUND THEN
    RETURN QUERY
    SELECT p.title, c.stock
    FROM   color c
    JOIN   product p ON c.id_product = p.id
    WHERE  c.unique_id = $1;
END;

IF NOT FOUND THEN
    RETURN QUERY
    SELECT p.title, s.stock
    FROM   size s
    JOIN   product p ON s.id_product = p.id
    WHERE  s.unique_id = $1;
END IF;

END;
$BODY$;

Updated function with table-qualified column names to avoid naming conflicts with OUT parameters.

RETURNS TABLE requires PostgreSQL 8.4, RETURN QUERY requires version 8.2. You can substitute both for older versions.

It goes without saying that you need to index the columns unique_id of every involved table. id should be indexed automatically, being the primary key.


Redesign

Ideally, you can tell which table from the ID alone. You could keep using one common sequence, but add 100000000 for the first table, 200000000 for the second and 300000000 for the third - or whatever suits your needs. This way, the least significant part of the number is easily distinguishable.

A plain integer spans numbers from -2147483648 to +2147483647, move to bigint if that's not enough for you. I would stick to integer IDs, though, if possible. They are smaller and faster than bigint or text.


CTEs (experimental!)

If you cannot create a function for some reason, this pure SQL solution might do a similar trick:

WITH x(uid) AS (SELECT 10) -- provide unique_id here
    , a AS (
    SELECT title, stock
    FROM   x, product 
    WHERE  unique_id = x.uid
    )
    , b AS (
    SELECT p.title, c.stock
    FROM   x, color c
    JOIN   product p ON c.id_product = p.id
    WHERE  NOT EXISTS (SELECT 1 FROM a)
    AND    c.unique_id = x.uid
    )
    , c AS (
    SELECT p.title, s.stock
    FROM   x, size s
    JOIN   product p ON s.id_product = p.id
    WHERE  NOT EXISTS (SELECT 1 FROM b)
    AND    s.unique_id = x.uid
    )
SELECT * FROM a
UNION ALL
SELECT * FROM b
UNION ALL
SELECT * FROM c;

I am not sure whether it avoids additional scans like I hope. Would have to be tested. This query requires at least PostgreSQL 8.4.


Upgrade!

As I just learned, the OP runs on PostgreSQL 8.1.
Upgrading alone would speed up the operation a lot.


Query for PostgreSQL 8.1

As you are limited in your options, and a plpgsql function is not possible, this function should perform better than the one you have. Test with EXPLAIN ANALYZE - available in v8.1.

SELECT title, stock
FROM   product 
WHERE  unique_id = 10

UNION ALL
SELECT p.title, ps.stock
FROM   product p
JOIN  (
    SELECT id_product, stock
    FROM   color
    WHERE  unique_id = 10

    UNION ALL
    SELECT id_product, stock
    FROM   size
    WHERE  unique_id = 10
    ) ps ON ps.id_product = p.id;
Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • 1
    I don't know about "much faster" *if* good indices are used .. but it could be "faster enough" to warrant usage so +1. –  Jul 10 '12 at 22:18
  • Great solution! The problem is that my host doesn't allow plpgsql functions =( – Marcio Mazzucato Jul 10 '12 at 22:18
  • @pst: It should be about twice as fast. Mostly because it only searches 1,5 tables on average. And query plans are also cached with plpgsql functions - a small additional win here. – Erwin Brandstetter Jul 10 '12 at 22:21
  • @ErwinBrandstetter I suppose I *generally* only consider differences that are the better part of an order of magnitude (base 10) or affect O(n), where it matters :) e.g. changing a TABLE SCAN to an INDEX SEEK or fixing the a bum plan to MERGE JOINS instead of an improperly-guessed LOOP JOIN. –  Jul 10 '12 at 22:41
  • @pst: I generally agree, but I take an order of magnitude (base 2) any day, if better is out of reach. Toned down "much faster" to just "faster" to accommodate for your input, though. – Erwin Brandstetter Jul 10 '12 at 23:54
  • I tried to run a query using `WITH` but i got a syntax error, certainly because of my version =(. I am trying other solution based in your suggestion, if works i will show the code. – Marcio Mazzucato Jul 11 '12 at 14:39
  • @MarcioSimao: Reading would have saved you the trouble: requires at least PostgreSQL 8.4. – Erwin Brandstetter Jul 11 '12 at 15:05
  • @ErwinBrandstetter, I confused with your last edit, but no problem. Can you please see my last edit and tell me what is your opinion about the query i decided to use? – Marcio Mazzucato Jul 11 '12 at 15:09
  • @MarcioSimao: I added a version to my answer that might be a bit faster. – Erwin Brandstetter Jul 11 '12 at 15:30
3

I think it's time for a redesign.

You have things that you're using as bar codes for items that are basically all the same in one respect (they are SerialNumberItems), but have been split into multiple tables because they are different in other respects.

I have several ideas for you:

Change the Defaults

Just make each product required to have one color "no color" and one size "no size". Then you can query any table you want to find the info you need.

SuperType/SubType

Without too much modification you could use the supertype/subtype database design pattern.

In it, there is a parent table where all the distinct detail-level identifiers live, and the shared columns of the subtype tables go in the supertype table (the ways that all the items are the same). There is one subtype table for each different way that the items are distinct. If mutual exclusivity of the subtype is required (you can have a Color or a Size but not both), then the parent table is given a TypeID column and the subtype tables have an FK to both the ParentID and the TypeID. Looking at your design, in fact you would not use mutual exclusivity.

If you use the pattern of a supertype table, you do have the issue of having to insert in two parts, first to the supertype, then the subtype. Deleting also requires deleting in reverse order. But you get a great benefit of being able to get basic information such as Title and Stock out of the supertype table with a single query.

You could even create schema-bound views for each subtype, with instead-of triggers that convert inserts, updates, and deletes into operations on the base table + child table.

A Bigger Redesign

You could completely change how Colors and Sizes are related to products.

First, your patterns of "has-a" are these:

  • Product (has nothing)
  • Product->Color
  • Product->Size
  • Product->Color->Size

There is a problem here. Clearly Product is the main item that has other things (colors and sizes) but colors don't have sizes! That is an arbitrary assignment. You may as well have said that Sizes have Colors--it doesn't make a difference. This reveals that your table design may not be best, as you're trying to model orthogonal data in a parent-child type of relationship. Really, products have a ColorAndSize.

Furthermore, when a product comes in colors and sizes, what does the uniqueid in the Color table mean? Can such a product be ordered without a size, having only a color? This design is assigning a unique ID to something that (it seems to me) should never be allowed to be ordered--but you can't find this information out from the Color table, you have to compare the Color and Size tables first. It is a problem.

I would design this as: Table Product. Table Size listing all distinct sizes possible for any product ever. Table Color listing all distinct colors possible for any product ever. And table OrderableProduct that has columns ProductId, ColorID, SizeID, and UniqueID (your bar code value). Additionally, each product must have one color and one size or it doesn't exist.

Basically, Color and Size are like X and Y coordinates into a grid; you are filling in the boxes that are allowable combinations. Which one is the row and which the column is irrelevant. Certainly, one is not a child of the other.

If there are any reasonable rules, in general, about what colors or sizes can be applied to various sub-groups of products, there might be utility in a ProductType table and a ProductTypeOrderables table that, when creating a new product, could populate the OrderableProduct table with the standard set—it could still be customized but might be easier to modify than to create anew. Or, it could define the range of colors and sizes that are allowable. You might need separate ProductTypeAllowedColor and ProductTypeAllowedSize tables. For example, if you are selling T-shirts, you'd want to allow XXXS, XXS, XS, S, M, L, XL, XXL, XXXL, and XXXXL, even if most products never use all those sizes. But for soft drinks, the sizes might be 6-pack 8oz, 24-pack 8oz, 2 liter, and so on, even if each soft drink is not offered in that size (and soft drinks don't have colors).

In this new scheme, you only have one table to query to find the correct orderable product. With proper indexes, it should be blazing fast.

Your Question

You asked:

in PostgreSQL, so do you think if i use indexes on unique_id i will get a satisfactory performance?

Any column or set of columns that you use to repeatedly look up data must have an index! Any other pattern will result in a full table scan each time, which will be awful performance. I am sure that these indexes will make your queries lightning fast as it will take only one leaf-level read per table.

ErikE
  • 48,881
  • 23
  • 151
  • 196
  • The topic "A Bigger Redesign" has some points that i will analyse. At this point i think it won't be possible to do a redesign, but it will be useful in other projects. I will try to improve the indexes and keep analysing the performance. – Marcio Mazzucato Jul 10 '12 at 23:02
1

There's an easier way to generate unique IDs using three separate auto_increment columns. Just prepend a letter to the ID to uniquify it:

Colors:

 C0000001
 C0000002
 C0000003

Sizes:

 S0000001
 S0000002
 S0000003
 ...

Products:

 P0000001
 P0000002
 P0000003
 ...

A few advantages:

  • You don't need to serialize creation of ids across tables to ensure uniqueness. This will give better performance.
  • You don't actually need to store the letter in the table. All IDs in the same table start with the same letter, so you only need to store the number. This means that you can use an ordinary auto_increment column to generate your IDs.
  • If you have an ID you only need to check the first character to see which table it can be found in. You don't even need to make a query to the database if you just want to know whether it's a product ID or a size ID.

A disadvantage:

  • It's no longer a number. But you can get around that by using 1,2,3 instead of C,S,P.
Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
  • Thanks for your suggestion. The `unique_id ` in my case works like a barcode. For example, if i have a product called X and it has green and red colors, each color must have a unique barcode. My query simulates a query to find a certain barcode, it can be in `product`, `color` or `size` – Marcio Mazzucato Jul 10 '12 at 22:05
1

Your query will be pretty much efficient, as long as you have an index on unique_id, on every table and indices on the joining columns.

You could turn those UNION into UNION ALL but the won't be any differnce on performance, for this query.

ypercubeᵀᴹ
  • 113,259
  • 19
  • 174
  • 235
1

This is a bit different. I don't understand the intended behaviour if stocks exists in more than one of the {product,color,zsize} tables. (UNION will remove duplicates, but for the row-as-a-whole, eg the {product_id,stock} tuples. That makes no sense to me. I just take the first. (Note the funky self-join!!)

SELECT p.title
        , COALESCE (p2.stock, c.stock, s.stock) AS stock
FROM product p
LEFT JOIN product p2 on p2.id = p.id AND p2.unique_id = 10
LEFT JOIN color c on c.id_product = p.id AND c.unique_id = 10
LEFT JOIN zsize s on s.id_product = p.id AND s.unique_id = 10
WHERE COALESCE (p2.stock, c.stock, s.stock) IS NOT NULL
        ;
wildplasser
  • 43,142
  • 8
  • 66
  • 109