1

I have a table that has 50 fields:

  1. 10 Fields that are almost always needed.
  2. 40 Fields that are very rarely needed.

I would roughly say that the fields in (1) are needed to be accessed 1000 times more frequently than the fields in (2).

Should I split them to two tables with one-to-one relation, or keep all in the same table?

ErezO
  • 364
  • 2
  • 10

2 Answers2

2

The process that you are describing is sometimes referred to as "vertical partitioning". Taken to an extreme (one column per vertical partition), this is how columnar databases store data. Unfortunately (to the best of my knowledge), Postgres does not currently have direct support for vertical partitioning.

Your idea of splitting the data into two tables is fine. I would note the following:

  • You will need to modify queries that use the extra columns to use the second table. (You can wrap the join into a view which you use when you want the extra columns.)
  • If both tables have a clustered primary key that connects them, then the join should be really fast.
  • If you are inserting/updating/deleting data, then you need to be careful about synchronization. I think you can handle this with an INSTEAD OF trigger on a view combining the tables.
  • If some records do not have extra columns, this can be a big win on the space side.
  • If all records and all columns are going to be loaded into the cache, then this probably is not a big win.

This can be a big performance win, under some circumstances. But there is additional manual work to keep the tables synchronized.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • 1
    In my case, there's also a conceptual difference between the fields in (1) and (2). The fields in (2) are more like "Additional settings" for that table which have a different edit form. So for example, they can't be modified together which makes life much easier. – ErezO Jul 27 '15 at 18:37
0

There's really not nearly enough information here to estimate (never mind actually quantify) what the benefits might be, but the costs are very clear -- more complex code, a more complex schema, probably greater overall space usage, and a performance overhead when adding and removing rows.

A performance improvement might come from scanning a smaller amount of data when performing a full table scan, or from an increased likelihood in finding data blocks in memory when required, and an overall smaller memory footprint, but without specific information on the types of operation commonly performed, and whether the server is under memory pressure, no reliable advice can be given.

Be very wary of making your system more complex as a side-effect of uncertain performance gains.

David Aldridge
  • 51,479
  • 8
  • 68
  • 96
  • In my case, the increase in complexity is limited because the fields in (1) and (2) are conceptually different, meaning I would describe them to a person as two different tables. The only reason I'm keeping them together is because of Jeff's influential post – ErezO Jul 27 '15 at 18:42
  • Would you expect there to be a one-to-one correlation between the rows in the potential two tables? – David Aldridge Jul 27 '15 at 21:16
  • Yes, can never be more, and they can never be empty, so I won't win on the disk space size as well. The main question is the performance side. both options are just as easy to maintain. – ErezO Jul 28 '15 at 03:55
  • Have you identified a performance problem with the current schema? – David Aldridge Jul 28 '15 at 10:10