2

The table

I got a table that contains price for some 1 000 000 articles. The articles got a uniques ID-number but the table contains prices from multiple stores. Thus if two stores got the same article the uniques ID will not be unique for the table.

Table Structure

table articles

id INT

price IN

store VARCHAR(40)

Daily use

Except for queries using the ID-number by users I need to run daily updates where data from csv-files insert/update each article in the table. The choosen procedure is to try to select an article and then perform either an insert or an update.

Question

With this in mind, which key should I choose?

Here are some solutions that Ive been considering:

  • FULLTEXT index of the fields isbn and store
  • Add a field with a value generated by isbn and store that is set as PRIMARY key
  • One table per store and use isbn as PRIMARY key
Joseph
  • 1,734
  • 6
  • 29
  • 51
  • Can you post the structure of your table please? – Dave Rix Oct 12 '11 at 12:49
  • 2
    Sounds like you need to restructure your tables. You should only have one row per item. If it has multiple prices these would be reflected in a separate table, which could be a many-to-many between item and store. – John Oct 12 '11 at 12:52
  • @John OK, and then use an e.g. autoincremented id as key there, but how can I increase the performans since the update procedure needs to go through 1M articles by isbn and update most of them, what do you suggest? – Joseph Oct 12 '11 at 13:01

2 Answers2

0

Use a compound primary key consisting of the store ID and the article ID - that'll give you a unique primary key for each item on a per-store basis and you don't need a separate field for it (assuming the store id and article id are already in the table).

Ideally you should have 3 tables... something like:

article
--------------------------------------------
id | isbn | ... etc ...


store
--------------------------------------------
id | description | ... etc ...


pricelist
--------------------------------------------
article_id | store_id | price | ... etc ...

With the PRIMARY KEY for pricelist being a compound key made up of article_id and store_id.

EDIT : (updated to incorporate an answer from the comment)

Even on a million rows the UPDATE should be OK (for a certain definition of OK, it might still take a little while with 1 million+ rows) since the article_id and store_id comprise the PRIMARY KEY - they'll both be indexed.

You'll just need to write your query so that it's along the lines of:

UPDATE pricelist SET price = {$fNewPrice} 
WHERE article_id = {$iArticleId} 
AND store_id =` '{$sStoreId}'

Though you may want to consider converting the PRIMARY KEY in the store table (store.id - and therefore also pricelist.store_id in the pricelist table) to either an unsigned INT or something like CHAR(30).

Whilst VARCHAR is more efficient when it comes to disk space it has a couple of drawbacks:

1: MySQL isn't too keen on updating VARCHAR values and it can make the indexes bloat a bit so you may need to occasionally run OPTIMIZE TABLE on it (I've found this on an order_header table before).

2: Any (MyISAM) table with non-fixed length fields (such as VARCHAR) will have to have a DYNAMIC row format which is slightly less efficient when it comes to querying it - there's more information about that on this SO post: MySQL Row Format: Difference between fixed and dynamic?

Community
  • 1
  • 1
CD001
  • 8,332
  • 3
  • 24
  • 28
  • Since I need to run updates on +1M posts in e.g. pricelist. Should I then check for the composite key of article_id and store_id? Will that create any performance issue? My testing is that it works fine searching with a PRIMARY key but is too demanding searching by a non-key. – Joseph Oct 12 '11 at 13:37
  • It should be OK since both the `article_id` AND `store_id` will be indexed (as they make up the primary key) - you'll just need to write your query so that it's along the lines of `UPDATE pricelist SET price = {$fNewPrice} WHERE article_id = {$iArticleId} AND store_id = '{$sStoreId}'` -- I'll update the answer since there's a couple of provisos :) – CD001 Oct 12 '11 at 15:41
0

Your indexes should be aligned with your queries. Certainly there should be a primary key on the articles table using STORE and ID - but the order in which they are declared will affect performance - depending on the data in the related tables and the queries applied. Indeed the simplest solution might be PRIMARY KEY(STORE, ID) and UNIQUE KEY(ID, STORE) along with foreign key constraints on the two fields.

i.e. since it makes NO SENSE to call this table 'articles', I'll use the same schema as CD001:

CREATE TABLE pricelist (
    id INT NOT NULL ,
    price INT,
    store VARCHAR(40) NOT NULL
    PRIMARY KEY(store,id),
    UNIQUE KEY rlookup (id, store)
    CONSTRAINT id FOREIGN KEY articles.id,
    CONSRAINT store FOREIGN KEY store.name
);

Which also entails having a primary key on store using name.

The difference between checking a key based on a single column and one based on 2 columns is negligible - and normalising your database properyl will save you a LOT of pain.

symcbean
  • 47,736
  • 6
  • 59
  • 94