0

I've many pairs of tables and I want to efficiently determine which pairs have tables that are related to each other by PK-FK relationships.

  1. If PK-FK relationship is documented, I'll leverage that but I cannot assume that it's always documented. (For example, if the source is flat file etc.)
  2. In cases where it's not documented, I need a way to efficiently determine if the two tables are related.
  3. I'll need to work with composite keys if the tables in the pair is not related by non-composite keys.

What I'm looking for is one of below things:

  1. Any existing algorithm.
  2. Any research paper.
  3. Any ideas/pointers.
  4. Any of similar works available online.
  5. Book suggestions etc.

For Example:

Suppose I've below pairs of tables:

{Table1, Table2}
{Table3, Table4}
{Table5, Table6}

and many more...

Out of the above pairs, I want to return the pair {Table3, Table4} if the tables Table3 and Table4 are related to each other by PK-FK relationship. To return the correct pair I need an efficient way to determine if the tables are related or not (assuming the relationship is not documented).

NOTE: I'm looking for data based discovery for PK-FK relationships between the tables and not just metadata based discovery (because in some cases metadata may not be available).

Many thanks in advance.

  • Possible duplicate of [Tools for discovering de facto foreign keys in databases?](http://stackoverflow.com/questions/7031203/tools-for-discovering-de-facto-foreign-keys-in-databases) – philipxy Dec 18 '16 at 06:33
  • Besides the duplicate above, keep googling, eg '(find OR discover) ("foreign keys" OR dependencies OR relationships OR relations) in database stackoverflow.com'. Also many questions that just wanted metadata queries got answers about discovery. – philipxy Dec 18 '16 at 06:34

1 Answers1

0

Disclaimer: I am an author of one such algorithm, Linkifier. Linkifier estimates FK constraints based on metadata (like the similarity of column names and their data types) and column statistics (like average column width in bytes).

Before more is said, it is important to note that FK constraint identification is generally a tough problem - there are reasons why DBs do not infer the FK constraints but a user must provide them. Hence, do not expect the estimates to ever be 100% accurate.

A general recommended workflow if you want to set the constraints in the database:

  1. Make sure that the columns have reasonable data types and names. DBLint may help.
  2. If the database is small, tell the database to calculate statistics on all columns in the database. This will improve the accuracy of the PK and FK constraint estimates. However, the statistics calculation on a large and complex database can take a considerable amount of time. On the other end, the statistics do not have to be accurate (calculated on all the data) - estimates (based on subsamples of the data) are enough. Consult the documentation of your database vendor.
  3. If the PKs are not set in the database, let Linkifier estimate the PKs. After reviewing the estimates, execute the generated SQL alter queries on the database. The knowledge of the true PKs will then improve the accuracy of the FK constraint estimation because one side of the FK constraints will be known to be correct. Warning: Setting up and enforcing PKs can be time-consuming (particularly on large transaction tables).
  4. If you have added PKs into the database, rerun Linkifier to get the actualized estimates of the FK constraints.
  5. After careful reviving of the FK constraint estimates, execute the generated alter queries on the database. Warning: Blind application of the estimated FK constraints to the database is discouraged as false positives may actually be accepted by the database as valid. Also, setting up and enforcing FK constraints can be time-consuming.

If you just want to generate an ER diagram, export the estimates from Linkifier and visualize them with yEd (a suitable layout algorithm for ER diagrams is BPMN) or with Oracle Data Modeler.

user824276
  • 617
  • 1
  • 7
  • 20