-1

In SQL I'm considering the following problem.

I have a list of A_ids and a list of B_ids.

  • the number of unique A_ids ~ 1.000s
  • the number of unique B_ids ~ 1.000.000s

The idea is that I for each A_id have a list of B_ids, with potentially many B_ids in this list (many to many).

I could simply store them in the format

| a_id | b_ids |
| 1 | '1,2,3,4,5' |
| 2 | '1,2,4,5' |
| 3 | '1' |
| 4 | '1,2' |
| 5 | '3,4' |
| 6 | '2,3' |
...

I however read that normalization i.e. simply doing:

| a_id | b_id |
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 1 | 5 |
| 2 | 1 |
...

is better practice but I fear the impact of having a huge amount of rows (i.e. 1.000.000.000+)

I understand the drawbacks with either but what is the better tradeoff?

Bjarke Kingo
  • 400
  • 7
  • 14
  • Yes, it's generally considered an acceptable tradeoff. The problems with the first format are pretty severe. Relational databases are designed to support lots of rows. – Barmar Jun 12 '23 at 16:04
  • 1
    You might like to read my list of drawbacks of storing the comma-separated list: https://stackoverflow.com/a/3653574/20860 – Bill Karwin Jun 12 '23 at 16:08
  • 1
    Does this answer your question? [Is storing a delimited list in a database column really that bad?](https://stackoverflow.com/questions/3653462/is-storing-a-delimited-list-in-a-database-column-really-that-bad) – philipxy Aug 11 '23 at 22:08

1 Answers1

1

Normalisation is the route to follow

  1. For a modern DBMS, that’s not a particularly large number of rows
  2. As you would index the table appropriately, you would only access the rows in the table actually used by any query rather than do a full table scan (unless your query requires a full table scan)
NickW
  • 8,430
  • 2
  • 6
  • 19