3

Good day all,

I have been learning about databases and database design and I find I am still reaching a question I cannnot answer myself. So I pose the question to the community in hopes that someone with more knowledge/experience than I can answer it.

I have been tasked with working on a database which tracks stock levels accross a fleet of ships.

The current design has a table for each ship with a list of all possible parts (Machinery Type, Part Number, Make, Serial No etc.)

This means that the details of a piece of machinery or part can be duplicated many times (as many times as there are ships in fact).

I have been experimenting with a redesign based on what I have learnt myself, and I would propose a design along the following lines:

[SHIP]
ID, Name, Class, Tonnage, Fleet, Superintendent etc.

[Machinery]
ID, Type, Make, Model etc. (Can have separate table for manufacturers and types if required)

[Part]
ID, Part number, Description, etc.

The above would be the three main tables now is where it starts to get difficult.

Each ship can have multiple items of machinery and each machinery item could be present on multiple ships (requires a junction table)

Each machinery item can have multiple parts and each part could belong to multiple machinery items (another junction table)

There could be well into hundreds of thousands of parts which would make the junction tables huge.

Additionally as soon as you want to keep track of stock you are looking at another junction table

[Stock Level]
ShipID, PartID, Stock Level

Also if you wanted a minimum stock (Could be combined with Stock Level?)

[Min Stock]
ShipID, PartID, Min Stock

And finally if you were looking to have normalised database (i.e no Part No.1 , Part No.2 or Serial No.1, Serial No.2)

You would need to have a few extra tables

[Serial Numbers]
ShipID, MachineryID, Serial No

[Part Numbers]
PartID, Part Number

Serial numbers is probably going to be fairly standard and no problem however [part numbers] will require at least as many records as are in the [Parts] table.

Map (As best as I can represent without a picture, junctions omitted for simplicity)

  <>V represent many 
  -| represent one

   -----< Serial Numbers
  |             V
  |             |
Ship >---< Machinery >---< Parts ---< Part Numbers
  V                          V
  |                          |
   ------ Stock Level -------

Now the real question is am I missing something in the basic design principles that would eliminate such huge junction tables or is this to be expected with this kind of database.

Also in cases like with part numbers where normalisation requires an additional table with at least the same number of records rather than extra columns in the original table is this the kind of thing that you would later denormalise to improve query speed?

Any hints, tips or pointers to external resources (including other forums, tutorials, books) would be greatly appreciated.

All answers welcome, thank you in advance for any help you provide.

Dave

Dave Williams
  • 2,166
  • 19
  • 25
  • Huge junction tables are normal. If by huge you mean a few million rows, that's not really very big, and they are easily indexed. – mellamokb Jul 24 '12 at 14:52
  • Fast answers you guys, that answers one of my questions. I need to do some research on indexing though. I kind of thought indexing was done by the DBMS automatically but I will do some investigation as that could be a lot of help with speed. – Dave Williams Jul 24 '12 at 15:17

2 Answers2

7

First, no: many-to-many relationships are not indicators of bad design.

Second, join tables are always larger than the tables they're connecting. If you have two tables with 100 records each, the join table can be up to 10000 records.

But join tables are nothing but integer pairs, and when you index them they can be very fast.

Edited to add:

You commented:

I kind of thought indexing was done by the DBMS

It's up to you to tell the DBMS what to index. After that, keeping the indexes up-to-date is done automatically.

Some DBMS have tools that will monitor your system in action and advise you on what indexes would be beneficial.

Your design process is good, so you're probably on the right track.

egrunin
  • 24,650
  • 8
  • 50
  • 93
  • I take it that my original design (plus a bit more normalisation) would be more or less the correct approach then. And in the case of my "relatively small" database, indexed junction tables would perform fast enough. – Dave Williams Jul 24 '12 at 15:20
2

In addition to what egrunin said, junction tables are often good candidates for clustering (aka index-organized tables), if your DBMS supports it.

For example, clustering the...

[Stock Level]
ShipID, PartID, Stock Level

...would store rows with the same ShipID physically close together, making it very efficient to get all the rows associated with the given ship.

On the other hand, clustering...

[Stock Level]
PartID, ShipID, Stock Level

...would store rows with the same PartID close together, making it efficient to get the rows associated with the given part.

If you need to query in both of these directions, you'll need 2 indexes ({ShipID, PartID} and {PartID, ShipID}). Normally, secondary indexes in clustered tables must contain copy of the whole PK, requiring additional storage and causing double-lookup, making it expensive to cluster a table with multiple indexes. In this case however, we cover the same fields anyway (just in different order) so there is no overhead in the secondary index. You could even consider covering Stock Level with the secondary index to avoid the double-lookup (the clustering index already covers it naturally).

In case your DBMS doesn't support clustering, consider indexing {ShipID, PartID, Stock Level} (or {PartID, ShipID, Stock Level} or both). This way, your query is covered by the index and there is no need to access the table heap. Essentially, you achieve the effect of clustering, except additional space is wasted on the (redundant) table heap.


In addition to that, some DBMSes support leading-edge index compression, greatly diminishing the storage space of repeated values in indexes of junction tables. Compress a clustered table (which itself is an index) for maximal effect.

Community
  • 1
  • 1
Branko Dimitrijevic
  • 50,809
  • 10
  • 93
  • 167
  • I had a play with indexing and it looks like it something that will be quite good to learn more about. At the moment I was just experementing with the structure in access which I don't think supports clustering but I haven't had a chance to check yet. When I build the actual database it will be in MSSQL, I am still relatively new to this (only made one useful database so far). I'd like to use EF with this database so I have a lot of learning still to do. However I will take all you said in mind, and thanks for the links as well. – Dave Williams Jul 25 '12 at 07:41
  • 1
    @DaveWilliams One caveat about MS SQL Server: `PRIMARY KEY` constraints [default to `CLUSTERED`](http://msdn.microsoft.com/en-us/library/ms174979(v=sql.105).aspx). You'll need to **explicitly** specify `PRIMARY KEY NONCLUSTERED` if you want a heap-based table. Just because clustering is good for some tables doesn't mean it's good for all, and this syntax quirk can lead to over-clustered database and performance problems when people don't pay attention... – Branko Dimitrijevic Jul 25 '12 at 08:55
  • I just found that option in SSMS. It's quite deeply buried. I will have to investigate this further as well. Thank you for giving me the heads up and pointing me in the right direction. It's a complex beastie but I always think the best way to learn is through use :) thanks again. – Dave Williams Jul 25 '12 at 13:06