8

im need to design some database tables but im not sure about the performance impact. In my case its more about the read performance than for saving the data.

The situation

With the help of pattern recognition im finding out how many values of a certain object needs to be saved in my postgresql database. Amount other lets say fixed properties the only difference is if 1, 2 or 3 values of the same type needs to be saved.

Currently im having 3 entities/tables which differ only in having having 1, 2 or 3 not nullable properties of the same type.

For example:

EntityTestOne/TableOne {
    ... other (same) properties
    String optionOne;
}

EntityTestTwo/TableTwo {
    ... other (same) properties
    String optionOne;
    String optionTwo;

}

EntityTestThree/TableThree {
    ... other (same) properties
    String optionOne;
    String optionTwo;
    String optionThree;
}

I expect to have several million records in production and im thinking what could be the performance impact of this variant and what could be alternatives.

Alternatives

Other options which come into my mind:

  • Use only one entity class or table with 3 options (optionTwo and optionThree will be nullable then). If to talk of millions of expected records plus caching im asking myself isn't it a kind of 'waste' to save millions of null values in at least two (caching) layers (database itself and hibernate). In a another answer i read yesterday saving a null value in postgresql need only 1 bit what i think isnt that much if we talk about several millions of records which can contain some nullable properties (link).
  • Create another entity/table and use a collection (list or set) relationship instead

For example:

EntityOption {
    String value;
}

EntityTest {
    ... other (same) properties
    List<EntityOption> options;
}
  • If to use this relationship: What would give a better performance in case of creating new records: Creating for every new EntityTest new EntityOption's or doing a lookup before and reference a existing EntityOption if exists? What about the read performance while fetching them later and the joins which will be needed then? Compared to the variant with one plain Entity with three options i can imagine it could be slightly slower...

As im not that strong in database design and working with hibernate im interested of the pros and cons of these approaches and if there are even more alternatives. I even would like to ask the question if postgresql is the right choice for this or if should think about using another (free) database.

Thanks!

Community
  • 1
  • 1
StephanM
  • 1,350
  • 2
  • 24
  • 50
  • Slightly updated the question. Would really very much appreciate any hints/suggestions/articles/whatever on this topic... – StephanM May 12 '17 at 07:42
  • I would say that your 2nd solution (3 nullable columns in a single table) will always be superior to the 1st (3 separate tables). Your 1st solution just don't have any benefit at all (i.e. it is not a single entity, so it cannot be referenced; structure changes must be populated to 3 independent tables, etc.), while having nullable columns in Postgres doesn't hurt that much. As for your 3rd solution: it depends on your use cases/actual queries. It might worth take a look, if you typically query just a few records from `EntityTest` at a time. And it also allows you having more than 3 options. – pozs May 15 '17 at 09:04
  • 10M rows is not that much for modern hardware. Just try all options (populate with dummy data with a real distribution) and measure the time it takes to run the queries on your hardware. "If you have two horses and you want to know which of the two is faster then race your horses" - see [Which is faster?](https://ericlippert.com/2012/12/17/performance-rant/) by Eric Lippert for more background. – Vladimir Baranov May 15 '17 at 11:44

2 Answers2

6

The case is pretty clear in my opinion: If you have an upper limit of three properties per object, use a single table with nullable attributes.

A NULL value does not take up any space in the database. For every row, PostgreSQL stores a bitmap that contains which attributes are NULL. This bitmap is always stored, except when all attributes are not nullable. See the documentation for details.
So don't worry about storage space in this case.

Using three different tables or storing the attributes in a separate table will probably lead to UNIONs or JOINs in your queries, which will make the queries more complicated and slow.

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
1

There are many inheritance strategy for creating entity class, I think you should go with single table strategy, where there will be a discriminator column (managed by hibernate itself), and all common filed will be used by each entity and some specific fields will be use by specific entity and remain null for other entity. This will get improved read performance. For your ref. : http://www.thejavageek.com/2014/05/14/jpa-single-table-inheritance-example/

Prateek Singh
  • 1,106
  • 10
  • 18