Is redundant data an acceptable trade-off in a normalized database structure?

Question

In SQL I'm considering the following problem.

I have a list of A_ids and a list of B_ids.

The idea is that I for each A_id have a list of B_ids, with potentially many B_ids in this list (many to many).

I could simply store them in the format

| a_id | b_ids |
| 1 | '1,2,3,4,5' |
| 2 | '1,2,4,5' |
| 3 | '1' |
| 4 | '1,2' |
| 5 | '3,4' |
| 6 | '2,3' |
...

I however read that normalization i.e. simply doing:

| a_id | b_id |
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 1 | 5 |
| 2 | 1 |
...

is better practice but I fear the impact of having a huge amount of rows (i.e. 1.000.000.000+)

I understand the drawbacks with either but what is the better tradeoff?

Yes, it's generally considered an acceptable tradeoff. The problems with the first format are pretty severe. Relational databases are designed to support lots of rows. — Barmar, Jun 12 '23 at 16:04
You might like to read my list of drawbacks of storing the comma-separated list: https://stackoverflow.com/a/3653574/20860 — Bill Karwin, Jun 12 '23 at 16:08
Does this answer your question? [Is storing a delimited list in a database column really that bad?](https://stackoverflow.com/questions/3653462/is-storing-a-delimited-list-in-a-database-column-really-that-bad) — philipxy, Aug 11 '23 at 22:08

score 1 · Accepted Answer · answered Jun 12 '23 at 16:04

Normalisation is the route to follow

For a modern DBMS, that’s not a particularly large number of rows
As you would index the table appropriately, you would only access the rows in the table actually used by any query rather than do a full table scan (unless your query requires a full table scan)

1 Answers1