Can someone advise me on the SQL data type that should be used for a DICOM UID, 1.2.840.113986.3.2702661254.20150220.144310.372.4424 as a sample. I would like to use it as a primary key as well.
-
Why not use `text`? – sebenalern Jun 06 '16 at 17:20
-
Just looking for the best option, is text the most efficient? – Adam Zenner Jun 06 '16 at 17:31
-
3Maybe not a good idea for a primary key. You will have to use it as `TEXT`, definitely indexed. There is no efficiency here to consider, because `TEXT` is your only option. – vitaly-t Jun 06 '16 at 17:34
-
1In case you want to apply the DICOM length constraint you could also use char(64) or varchar(64). I agree that it should not be used as a primary key. It would work fine in the study table but it is not very handy to be used as a foreign key (e.g. in a series table). In applications I had designed I modeled it as a varchar(128) to allow some buffer for violations to the length constraint. It works quite well, even in large scale databases (> 30 million images) – Markus Sabin Jun 07 '16 at 12:49
-
If you can anonymize your data, you could use this [trick](https://stackoverflow.com/questions/66226614/postgresql-convert-uuid-into-oid) – malat Nov 18 '22 at 10:46
1 Answers
There are two options available here- either use a less-than-ideal data type which already exists, of which "text" is almost certainly the best option, or implement a custom data type for this particular type of data.
While the best built-in option is "text", looking at the example provided, you would likely get significant performance and space benefits from using a custom data type, though it would require writing code to implement it.
A final option to consider is to use a surrogate key for that data. To do this, you would build a table which contains a "bigserial" column and then a "text" column. The "text" column would hold the long form of the value as you have it shown above and the "bigserial" column would provide an integer (64bit with bigserial, 32 bit if you use "serial" instead) which you would then use in all of your tables, instead of the long form.

- 680
- 4
- 7
-
Why should one implement it in that complicated way? Which benefit will be received from that. And what stands against using ordinary (portable) data types like char/varchar – Markus Sabin Jun 08 '16 at 06:18
-
Using a custom data type will require less space and will be faster. – Stephen Frost Jun 08 '16 at 12:39
-
Using a surrogate key and a side-table will reduce the size on-disk, assuming that the values are repeated often in the data. – Stephen Frost Jun 08 '16 at 12:39