What is the optimal relational database design for storing an unknown number of similar but unique entities

Question

The database we are designing allows users to authenticate with multiple 3rd party services, mostly social media (twitter, facebook, etc). There will be an unknown and growing number of these services. Each service requires a unique set of data for authentication that is not standard with the other services.

One user may authenticate many services, but they may only authenticate with one of each type of service.

Possible Solutions:

A) The most direct solution to this issue is to simply add a column for each service to the user table which contains the JSON authentication for that service. However, this violates normalization by leaving a large number of nulls in the database. What happens when there are 50 of these integrations for instance?

B) Each service gets its own table in the database. JSON is no longer needed as each field can be properly described. Then a lookup table is needed "user_has_service" for each service. This is a table which contains only two foreign keys, one for the user and one for the service, linking them together. This option seems the most correct but is very inefficient and will take many operations to determine what services a user has, increasing with the number of services. I believe also in this case, the ID field for the lookup table would need to be some kind of hash of the user and service together so that duplicate inserts are not possible.

Not at all a database expert and I have been grappling with this one for quite a while. Any thoughts?

Possible duplicate of [How can you represent inheritance in a database?](https://stackoverflow.com/questions/3579079/how-can-you-represent-inheritance-in-a-database) — philipxy, Oct 11 '19 at 03:03
@philipxy No it is not. 1) data should be viewed; analysed; designed through data analysis & the *Relational Model*, not through the myopic non-data lens of OO. 2) Nevertheless, in simple terms, OO inheritance translates to Relational Subtypes. 3) This question has nothing to do with OO inheritance (although OO inheritance or any other OO boondogle can be applied to it), there is no OO inheritance here. The inheritance (English, math, Codd definition, not the OO definition) here lies in the Relational Key: refer my answer. — PerformanceDBA, Oct 17 '19 at 01:29
@PerformanceDBA The linked question uses the word "inheritance" but it & its answers are about dealing with subtypes relationally. (Although inheritance happens to involve special cases of subtyping.) — philipxy, Oct 17 '19 at 01:50
@philipxy. Yes, I know. The only type of OO inheritance that is applicable in a database context is Subtypes, and that is absent in this question. Which is why this Q is not a dupe of that Q. Please read my comment again. — PerformanceDBA, Oct 17 '19 at 08:23

PerformanceDBA · Accepted Answer · 2019-11-17T22:21:14.970

A) The most direct solution ... JSON

You are right, option A is grossly incorrect. It breaks Codds' First Normal Form, thus it is not Relational. NULL in the database is an indication of incomplete Normalisation, which leads to complex SQL code. To be avoided at all costs.

similar but unique

To be clear, that they are unique to the Service is true. That {LoginName; UserName; Email; UserId; etc} are all similar is true in the implementation sense only, not in the data.

I may need to sketch this out.

That is a great idea. A visual data model is far more effective, because (a) the mind can comprehend it much better than text, and (b) therefore work out details; contradictions; missing bits; etc. Much easier to progress each iteration visually, than with text.

Second, we have had visual modelling tools since 1987 (1984 for a closed group), which have been made a Standard in 1993. Hopefully you appreciate that a standard-compliant model is better than a home-grown or corporate-supplied one. It displays all technical details rather than a small subset.

Is there a name for this strategy

It is plain old Relational Data Modelling, which includes Normalisation (ensuring compliance with Codd's Normal Forms, as opposed to the insanity of implementing the NFs is fragmented progressive steps).

Obstacle

One problem that needs to be understood and eliminated is this. The "theoreticians" market and propagate 1960's Record Filing Systems under the banner of "relational". That is characterised by a Record IDs in every file. That method ensures the database remains physical, not logical, the very thing that Codd overcame with his Relational Model: a database that is logical and therefore extremely easy to navigate, by any querying party, current; planned; or unplanned.

The essential difference between 1960's RFS and post-1970 Relational Databases is:

whereas the RFS maintains references between Files by physical pointer (Record ID), the Relational Database maintains references between Tables by logical Key.
A logical Key is "made up from the data" as per Codd
- (A datum that is fabricated by the system is not "made up from the data")
- (Use of the SQL command PRIMARY KEY does not magically anoint the datum with the properties and qualities of a Relational Key: if you use PRIMARY KEY RecordID you are in 1960's physical paradigm, not the post-1970 Relational paradigm)
Logical Keys provide Relational Integrity (as distinct from Referential Integrity, which is an ordinary function of SQL), which is far superior to that obtained by 1960's RFS
As well as far superior Speed and Power (far less JOINs, and smaller sets)

Relational Database

Therefore I will give you the answer as a Relational Data Model, as per Codd.

Ampix0TA

Just one example of Relational Integrity:
- the ServiceProperty FK elements in UserServiceProperty is constrained to PK (particular combination) in ServiceProperty
- a UserServiceProperty row with Facebook.Email is prevented
- A Record ID based 1960's RFS that the "theoreticians" promote as "relational" cannot do that, various errors such as that one are allowed.
All my data models are rendered in IDEF1X, the Standard for modelling Relational databases since 1993
My IDEF1X Introduction is essential reading for beginners.
The IDEF1X Anatomy is a refresher for those who have lapsed.
If you have trouble reading the Predicates directly from the Data Model, let me know and I will produce them in text form.

Please feel free to ask questions, the more specific the better.

Thank you so much for your detailed answer and examples. After reading your example I feel as though I may have overthought the issue. I am confident this is the correct solution. Thank you for making it clear. — Ampix0, Nov 17 '19 at 02:33
@Ampix0 1) You are most welcome. 2) I corrected one error in the example data, and added a few words of detail. 3) If you would like to get a feel for the **strategy**, read some of my other Answers. — PerformanceDBA, Nov 17 '19 at 07:06

GMB · Answer 2 · 2019-09-29T08:53:54.247

1

You could set up:

a referential table called services to list all the available services, with columns like service_id (primary key), service_name and descriptions and so on. Each service is represented as one record in this table.
a table called services_properties to store the properties of the services; this table has 3 columns: service_id (foreign key to the primary key of services), property_name and property_value. A unique constraint can be set up on service_id/propery_value tuples to avoid duplicates. Each service has several records in the services_properties table. This flexible structure lets you store as many different properties as needed for each service without creating a new table for each service
a mapping table called user_services, that relates users to services. Columns would be service_id and user_id, as foreign keys to the primary keys of the services table and users table. You can query this table to easily list the services subscribed by each user.

edited Sep 29 '19 at 08:53

answered Sep 29 '19 at 01:22

GMB

216,147
25
84
135

I may need to sketch this out. This seems somewhat similar to option B, I think I had come up with something slightly similar as well. What do you image here is in "property_name" and "property_value"? That part I am a little lost on. – Ampix0 Sep 29 '19 at 16:16
@Ampix0: this is different. In option B, each service gets in own table (which makes it then difficult to query in a generic manner). In my answer, all services go into the same table, that store the properties of all services. – GMB Sep 29 '19 at 16:51
Is there a name for this strategy so I could research this more? I am close here but missing one piece. I understand now what you mean with each service having multiple listing under service property but I am missing something where those values are then mapped to users. Right now it seems like the values are mapped to the service. I'd like to research this more but am not sure what to call this. – Ampix0 Sep 29 '19 at 17:22
@Ampix0: I see what you mean now. For that purpose, you would need one more column in the table to store the user id. In that situation, mapping table user_services is not mandatory. – GMB Sep 29 '19 at 17:27

What is the optimal relational database design for storing an unknown number of similar but unique entities

2 Answers2

Obstacle

Relational Database