Problem overview
When adopting an Inmon style 3NF Enterprise Data Model, what are some common techniques for handling surrogate keys and referential integrity? In my case I have to populate a 3NF data model that provides an "enterprise view" of several transactional systems. In addition, each OLTP is distributed so that there is one instance per country. Thus, the challenge I'm currently faced with is consolidating each source system into a unified data model.
Actual question
Because each country has its own "local" PK I need a strategy for handling conflicts when consolidating these into the EDW. Is it most common to simply create a composite key in this case? e.g. source_id + source_country or would it better practice to generate a surrogate key here?
For example:
A.foobar
id
description
...
B.foobar
id
description
...
Would become:
EDW.foobar
id
foobar_id
source_country
description
So that in the consolidated data model we end up with a new surrogate key (id) which uniquely identifies each source record (foobar_id + source_country). This seems logical but feels wrong for some reason. And furthermore, hence my question, what impact will this then have on handling referential integrity in the EDW? i.e. if we generate new surrogate keys between source 3NF & EDW 3NF then there's the added complexity of referencing these new keys throughout the EDW schema. In terms of ETL implementation, it would mean having to lookup the newly generated surrogate key via the existing FK (source system) and then replacing it as the new FK. This means maintaining multiple FKs (one to look up the new surrogate key and the new surrogate key itself) in the EDW which seems very far fetched.
If anybody has experience with this problem then I'd appreciate your advice as I don't think my current approach is going to work. There are also several corollary topics e.g. versioning and historization, as well as cdc between EDW 3NF and data marts, that also come into play here but I'll come back to those later.
N.B.
Most of the research I've conducted relates specifically to populating Kimball style data marts, rather than Inmon's 3NF Enterprise Data Model. Furthermore, I've struggled to find anything useful on the topic of consolidating distributed databases whereby the underlying schema is the same.