0

I have following association in pivot way.

| DOCID | Note1 | Note2 | Note3 |
|-------|-------|-------|-------|
|   1   |  N11  |  N21  |  N31  |
|   2   |  N12  |  NULL |  N32  |
|   3   |  N13  |  N23  |  N33  |
|   4   |  N14  |  N24  |  NULL |
|   5   |  NULL |  N25  |  N35  |

Other way of storing above is as below.

| DOCID |  Field  | Value |
|-------|---------|-------|
|   1   |  Note1  |  N11  |
|   1   |  Note2  |  N21  |
|   1   |  Note3  |  N31  |
|   2   |  Note1  |  N12  |
|   2   |  Note3  |  N32  |
|   3   |  Note1  |  N13  |
|   3   |  Note2  |  N23  |
|   3   |  Note3  |  N33  |
|   4   |  Note1  |  N14  |
|   4   |  Note2  |  N24  |
|   5   |  Note2  |  N25  |
|   5   |  Note3  |  N35  |

which of the above two option is better.

I might have more null values. in that case 2nd option seems better. as it will have less records.

but when I have 10 million records, it will be multiplied by notes (in our case it will be (30 million - null) records).

So considering performance for fetching associated records. which option is better and why?

I will have more notes associated with DocIDs.

Anonymous Creator
  • 2,968
  • 7
  • 31
  • 77
  • Keep in mind that certain DBMS's will handle pivot operations better than others. https://stackoverflow.com/questions/15931607/convert-rows-to-columns-using-pivot-in-sql-server Microsoft, Oracle, Postgres and a few others have very directed pivot functions that make the task significantly easier than say MySQL here https://stackoverflow.com/questions/7674786/mysql-pivot-table – Twelfth Mar 13 '18 at 19:38

1 Answers1

0

"Better" is often subjective. In this case, though, I think one method is generally better than the other.

The second approach is the better approach -- one row per document/note pair. In general, when you have columns that are only distinguished by a number -- but otherwise contain the same things -- then the data model is suspect. There may be good reasons for representing the data across columns, but the structure should be questioned. If you still need it, then fine.

Consider a simple query such as which ids have a particular note. In the first representation, you need to check all three columns. This makes it hard to use an index. And, it negates the value of columnar storage.

If the business changes and you suddenly want 4 notes per docid -- or want to limit them to 2 -- then the table needs to be restructured. That is an expensive process.

I'm not sure what the notes refer to. But if they represent a foreign key relationship to another table, then the pivoted version needs to maintain multiple foreign key relationships -- for essentially the same purpose.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786