0

We are creating a Data Mart consisting of around 8 Star Schemas, each representing a business event, process or reporting set of data. We have an equal mix of Transaction, Snapshot and Accumulating Snapshot schemas. The value we are intending to use in most fact tables as the primary key is ContractID however in the source DB this is a string in this format "X12345.001".

I have read a couple of contradictory opinions on the use of strings as PKs:

Strings as Primary Keys in SQL Database - The top answer in this article suggests it is ok but the question is not Data Mart specific.

String as primary KEY? - The top answer in this article is dead set against the use of strings as PKs.

The Data Mart content will never be huge (low millions of rows across all stars, never billions). It would be less effort to use the current PK (ContractID) as this will be consistent across all tables but I am worried about performance. The question is whether a string can be used as a primary key in a Data Mart.

Thanks

Community
  • 1
  • 1
john
  • 145
  • 1
  • 4
  • A string CAN be used but should it be used?The answer is no if you can help it. – Mihai Sep 08 '14 at 16:18
  • Your fact tables have only one row per ContractID? I would have thought that unlikely but if you are right then there's nothing wrong with making it a primary key. – nvogel Sep 08 '14 at 21:05

1 Answers1

1

In the end we stored the Contract ID in a dimension which is probably right as the ID is more of a description of the contract though it is unique) with the SQL generated ID of the fact table acting as the FK in the fact table. The fact table does hold multiple rows per contract as you expect with an acumulating snapshot.

We let SQL handle all IDs and have restricted everything in the fact table to integers, the performance is blistering in our opinion!

john
  • 145
  • 1
  • 4