0

Problem Statement

I have the following candidate key

{Student Id, Entity Id, Action Id......}

And I'm trying to decide whether I should use a composite primary key or a surrogate (auto increment integer or timestamp or guid) key.

Research & Analysis

I looked up up a couple of questions & discussions on stackoverflow but did not get a clear direction, specific to my scenario.

Should a composite key be e primary key

Compositve primary key or primary surrogate key

Here are the access patterns to the table

Write operations

The data gets stored into this table in response to Student actions in the web app; i.e. we can view it as time series data capturing the user actions on different entities. Over a period of time, I expect the number of concurrent students logged into the web app to be very high (depending on adoption of the product by the student population!). Im thus expecting a high number of concurrent writes, and a large table size over time.

Read operations

The table is to be accessed per student whenever a student logs in to the system to show him an overall summary of his past actions.

Planned Approach

Keeping in mind the access patterns, projected size of the table and the time series nature (events!) of the data, I'm more inclined towards using an unique surrogate key (auto increment integer), and a composite secondary key to get the best balance of write & read performance. The other alternative being, to create a composite primary key and avoiding a surrogate key.

Can you validate my approach or give your suggestions on alternatives or refinements to my approach?

Alwyn - Numino Labs
  • 303
  • 1
  • 2
  • 11

1 Answers1

1

Technically, you are better off using a composite key. The only downside is that it's not suitable for entities.

What I mean is that an M:N may become an entity over time. It will gain columns and it's individual rows will be manipulated and referenced, which is complicated with composite keys.

A plain M:N table on the other side is required to have a unique key on referencing columns, so using it as a primary key makes a lot of sense. You get rid of extra ID column, sequence and additional index.

Boris Schegolev
  • 3,601
  • 5
  • 21
  • 34
  • If I keep a composite key, as MySql Innodb engine provides a clustered index, high number of concurrent writes could result in a lot of re-arranging of the existing records in the db resulting in poor write performance and thus poor response times for users? – Alwyn - Numino Labs Nov 28 '18 at 13:01
  • There should not be much difference. You would have the `UNIQUE KEY` in one case or the `PRIMARY KEY` in the other - implementation is basically same. Further more, if you are building a solution to handle thousands of transactions per second you should not have used MySQL in the first place. – Boris Schegolev Nov 28 '18 at 14:06
  • Boris, I should correct myself. I plan to use AWS RDS- Aurora serverless (MySQL InnoDb Engine). I expect Aurora to get the product to atleast 100k-200k total students on the platform, and at least a few thousand to be able to act/login concurrently. Also, I plan to avoid using the unique key, but have a secondary index. I expect the secondary index to be re-built asynchronously, and not significantly impact write performance? – Alwyn - Numino Labs Nov 28 '18 at 14:16