5

I am trying to understand if is better to use 1 big table or many smaller ones. I have ready over the web that it depends on each case so i would like any recommendations based on my example below:

Lets assume i would like to create a database which will hold all characteristics of assets. Assets can be split to different asset class (i.e equity, bonds, cash etc). All asset classes have similar characteristics (i.e ID Code, Issuer Name) and some assets have specific characteristics (Bonds has a Maturity Date while equities don't). So my question is shall i use a big table that will leave NULL when the asset is not applicable (as show below)

ALL Characteristic in 1 Table

or shall i use several tables as and join them every time i need to produce a report? (as shown below)

Characteristics in multiple Tables

What are the pros and cons on each case? for example if i have 1m different ID Codes would it need addition space for the 1 table option while the query performance will drop significantly with the multi table option?

What if the unique characteristics for each type is 50 and there are 10 different asset classes. Shall i create a 50 x 10 = 500 column table (with most of the column for each line to be NULL) or shall i have 10 different table and use LEFT JOIN when i would like to create a report?

Gizazas
  • 55
  • 1
  • 8
  • This question isn't really about the number of `NULL` values in a table. It's more about few tables-many columns versus many tables-few columns. And that's is a design, requirement and use-case issue, which can't easily be answered in a generic way. Pro for few tables-many columns: It's easy to get all data in a single query. Cons for it: Maybe you don't *need* all the data in a single query? Or want more local indexes? – Some programmer dude Feb 07 '18 at 11:52
  • If you need to query those entities together frequently, you can think of creating as a single table. Your requirement shows that they have something in common. But you can even manage these queries by using UNION operators, etc. Seperating as different tables makes it easier to maintain for a single entity – Eralper Feb 07 '18 at 11:56
  • Depending on the DBMS you are using you might want to consider a single `assets` table with columns for the common characteristics and putting the individual ones into a JSON column. You lose the type safety of the columns that way but it's probably easier to handle –  Feb 07 '18 at 12:34
  • Does this answer your question? [How can you represent inheritance in a database?](https://stackoverflow.com/questions/3579079/how-can-you-represent-inheritance-in-a-database) – philipxy Jan 17 '22 at 11:03

1 Answers1

6

You have a real question with good details on a particular application. I'm not sure if it is too broad or seeking an opinion. The question itself is hard to answer, but let me try to give you guidelines on answering it.

What you have is (potentially) a one-of relationship. The first question: At which level will you have foreign key relationships? If the general relationship is to the "asset", then that needs a primary key, leaning toward a single table. If each asset class will have its own relationship, then each needs a primary key, leaning toward multiple tables.

Of course, you might expect foreign key relationships to both. And there are ways to solve that.

The second issue is performance. In general wider tables (especially with fixed length columns) occupy space even when the columns are empty. So, if you have lots and lots of columns, all the empty space could eat up performance.

The third issue is specific functionality of the database. For instance, Postgres supports inheritance in tables. This fits your data model very accurately. The "asset" table can have the general attributes. Then each "child" table can have its own attributes.

Or, SQL Server supports sparse tables, which can mitigate the affect of wasting space on very wide tables.

A fourth issue is security and access rights. Do different asset classes have different requirements for access? Providing access at the table-level is generally simpler than providing access at the row level.

A fifth issue involves data modifications. Is all the data inserted at the same time? Do different assets have different update schedules? How does this feed into end-user queries? This can suggest storing the data in separate tables, to isolate updates. Or even having an information table to keep track of the most recent updates for each asset class.

So, as detailed as your question is, there are other considerations that feed into the decision.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786