SQL - split a large table according to how frequently they're accessed?

Question

I have a table that has 50 fields:

10 Fields that are almost always needed.
40 Fields that are very rarely needed.

I would roughly say that the fields in (1) are needed to be accessed 1000 times more frequently than the fields in (2).

Should I split them to two tables with one-to-one relation, or keep all in the same table?

What are you hoping to accomplish by separating them? Or to put it another way, what problem are you trying to solve by separating them? That could influence the answer I would think. — Kritner, Jul 27 '15 at 17:17
related: http://stackoverflow.com/questions/517417/is-there-ever-a-time-where-using-a-database-11-relationship-makes-sense — Kritner, Jul 27 '15 at 17:49
@DavidAldridge I mean read. All fields in this table are rarely modified. — ErezO, Jul 27 '15 at 18:35

score 2 · Accepted Answer · answered Jul 27 '15 at 17:44

The process that you are describing is sometimes referred to as "vertical partitioning". Taken to an extreme (one column per vertical partition), this is how columnar databases store data. Unfortunately (to the best of my knowledge), Postgres does not currently have direct support for vertical partitioning.

Your idea of splitting the data into two tables is fine. I would note the following:

You will need to modify queries that use the extra columns to use the second table. (You can wrap the join into a view which you use when you want the extra columns.)
If both tables have a clustered primary key that connects them, then the join should be really fast.
If you are inserting/updating/deleting data, then you need to be careful about synchronization. I think you can handle this with an INSTEAD OF trigger on a view combining the tables.
If some records do not have extra columns, this can be a big win on the space side.
If all records and all columns are going to be loaded into the cache, then this probably is not a big win.

This can be a big performance win, under some circumstances. But there is additional manual work to keep the tables synchronized.

In my case, there's also a conceptual difference between the fields in (1) and (2). The fields in (2) are more like "Additional settings" for that table which have a different edit form. So for example, they can't be modified together which makes life much easier. — ErezO, Jul 27 '15 at 18:37

score 0 · Answer 2 · answered Jul 27 '15 at 18:20

0

There's really not nearly enough information here to estimate (never mind actually quantify) what the benefits might be, but the costs are very clear -- more complex code, a more complex schema, probably greater overall space usage, and a performance overhead when adding and removing rows.

A performance improvement might come from scanning a smaller amount of data when performing a full table scan, or from an increased likelihood in finding data blocks in memory when required, and an overall smaller memory footprint, but without specific information on the types of operation commonly performed, and whether the server is under memory pressure, no reliable advice can be given.

Be very wary of making your system more complex as a side-effect of uncertain performance gains.

answered Jul 27 '15 at 18:20

David Aldridge

51,479
8
68
96

In my case, the increase in complexity is limited because the fields in (1) and (2) are conceptually different, meaning I would describe them to a person as two different tables. The only reason I'm keeping them together is because of Jeff's influential post – ErezO Jul 27 '15 at 18:42
Would you expect there to be a one-to-one correlation between the rows in the potential two tables? – David Aldridge Jul 27 '15 at 21:16
Yes, can never be more, and they can never be empty, so I won't win on the disk space size as well. The main question is the performance side. both options are just as easy to maintain. – ErezO Jul 28 '15 at 03:55
Have you identified a performance problem with the current schema? – David Aldridge Jul 28 '15 at 10:10

SQL - split a large table according to how frequently they're accessed?

2 Answers2