Here's my final answer:
Create table Attribute(
ID int,
Name varchar(50));
Create table AttributeValue(
ID int,
AttributeID int,
Value varchar(50));
Create table userAttributeValue(
userID int,
AttributeID varchar(20),
AttributeValue varchar(50));
Create table User(
ID int);
Insert into user (ID) values (1),(2),(3),(4),(5);
Insert into Attribute (ID,Name) Values (1,'Location'),(2,'nwSpeed'),(3,'PriceRange');
Insert into AttributeValue values
(1,1,'bengaluru'),(2,1,'chennai'),
(3,2, 'w'), (4, 2,'4G'), (5,2,'3G'), (6,2,'2G'), (7,2,'G'), (8,2,'NA'),
(9,3,'1'), (10,3,'2'), (11,3,'3'), (12,3,'4'), (13,3,'5'), (14,3,'6'), (15,3,'7'), (16,3,'8'), (17,3,'9'), (18,3,'10');
Insert into UserAttributeValue (userID, AttributeID, AttributeValue) values
(1,1,1),
(1,2,5),
(1,3,9),
(2,1,1),
(2,2,4),
(3,2,6),
(2,3,13),
(4,1,1),
(4,2,4),
(4,3,13),
(5,1,1),
(5,2,5),
(5,3,13);
Select USERID
from UserAttributeValue
where (AttributeID,AttributeValue) in ((1,1),(2,4))
GROUP BY USERID
having count(distinct concat(AttributeID,AttributeValue))=2
Now if you need a count wrap userID in count and divide by the attributes passed in as each user will have 1 record per attribute and to get the "count of users" you'd need to divide by the number of attributes.
- This allows for N growth of Attributes and the AttributeValues per user without changes to UI or database if UI is designed correctly.
- By treating each datapoint as an attribute and storing them in once place we can enforce database integrity.
- Attribute and AttributeValue tables becomes lookups for UserAttributevalue so you can translate the IDs back to attribute name and the value.
- This also means we only have 4 tables user, attribute, attributeValue, and UserAttributeValue.
- Technically you don't have to store attributeID on the userAttributeValue, but for performance reasons on later joins/reporting I think you'll find it beneficial.
- You need to add proper Primary Key's, Foreign keys, and indexes to the tables. They should be fairly self explanatory. On UserAttributeValue I would have a few Composite indexes each with a different order of the unique key. Just depends on the type of reporting/analysis you'll be doing but adding keys as performance tuning is needed is commonplace.
Assumptions:
- You're ok with all datavalues being varchar data in all cases.
- If needed you could add a datatype, precision, and scale on the attribute table and allow the UI to cast the attribute value as needed. but since they are all in the same field in the database they all have to be the same datatype. and of the same precision/scale.
- Pivot tables to display the data across will likely be needed and you know how to handle those (and engine supports them!)
Gotta say I loved the metal exercise; but still would appreciate feedback from others on SO. I've used this approach in 1 systems I've developed and it's been in two I've supported. There are some challenges but it does follow 3rd normal form db design (except for the replicated attributeID in userAttributevalue but that's there for performance gain in reporting/filtering.