A better way to store unknown number of attributes

Question

Currently using Postgres 9.3 I have a Table Person(Id, FName, Lname, Address1, Adress2, phone1, phone1,....)

I could do Person(id, FName, Lname) and then Address(PersonID, AddressName, Address) and Pone(PersonID, PhoneName, Number)

But when when I need to add a new attribute, say email, I need to change the schema and add Email(PersonID, EmailName, Address)

What I want to do is Person(ID, AtrbLbl, AtribVal)

1, Fname, Ron
1, Lname, H
1, HomeEmal, rh@home.ca
1, HomeAddress, 123 st edmonton
2, LName, Smith
3, Fname, Bob
2, Fname, Sam
3, Lnaem, Marly
3, HomeAdress, Heven
2, HomeAddress, abc St.
1, FavorateColor, red
2, FavorateColor, red
3, FavorateColor, red
1, FavorateIcream, Chocolate 
2, FavorateIcream, Vanila
3, FavorateIcream, Mint
4, FName, tom
4, FavorateColor, blue

Where I, Ron H, am made up of all id = 1 and if, say I got a job you could add 1, WorkEmail, rh@Work.ca

So if I want all the attributes of everyone who's FavorateColor is red

Select * from person where id in (Select ID from person where  AtrbLbl = FavorateColor and AtribVal = red)`

My problem is search more than one attribute. In sudo sql what I want is

Select * from person where id in (Select id from person where (AtrbLbl = FavorateColor and AtribVal = red) AND (AtrbLbl = Fname and AtribVal = Ron)

Obviously that won't work.

What I was thinking of doing is

insert into temptbl
Select Count(id) cnt, ID from person where (AtrbLbl = FavorateColor and AtribVal = red) OR (AtrbLbl = Fname and AtribVal = Ron) 

Select * From person where id in (select id from temtbl where cnt = 2)  order by id
where 2 is the number of searched attributes.

So if I wanted the persons who like red, Chocolate and FName Ron

insert into temptbl
Select Count(id) cnt, ID from person where (AtrbLbl = FavorateColor and AtribVal = red) OR (AtrbLbl = Fname and AtribVal = Ron) OR (AtrbLbl = FavorateIcream and AtribVal = Chocolate) 

Select * From person where id in (select id from temtbl where cnt = 3)  order by id

In my mind I should be able to do this in on statement by joining the results from one part of the where to the results of another part.

Can anyone think of a single statement that can do this? Or a more elegant method?

NoSQL databases are basically made to do this. You might want to look into one of them if this is a common problem you are having. SQL works best with a defined schema. — MartianCodeHound, Jun 11 '15 at 20:16
I'll point out that this is an example of what I'm trying to do. In my real test I have a Person with a base set of cols like Name and birthday and a table personAttributes where all the other attributes are and I think I'll have an AttributeLable table to hold all of the already defined labels so that I don't get eye, and eyecolor for example. — Ron H, Jun 11 '15 at 20:16
@MartianCodeHound I'd love to switch over to one, I have looked at it and you're 100% right, it's the way to go but I have some other constraints. I'm interested in seeing how other people have broken SQL to make this kind of work. — Ron H, Jun 11 '15 at 20:19
Using the NoSQL Capabilities in Postgres : http://info.enterprisedb.com/rs/enterprisedb/images/EDB_White_Paper_Using_the_NoSQL_Features_in_Postgres.pdf This might also be relevant. Looks like I could get the best of both worlds maybe — Ron H, Jun 11 '15 at 20:25
store the data in Person(ID, AtrbLbl, AtribVal) form. write a view that returns the data desired using a pivot query (or [postgresql](http://wiki.postgresql.org/images/0/0b/PGforSmarties.pdf) version of it). If the number of columns are still unknown, make it a dynamic SQL and pivot. [Stack Example](http://stackoverflow.com/questions/3002499/postgresql-crosstab-query/11751905#11751905) — xQbert, Jun 11 '15 at 20:32
also check out SPARQL and RDF. but please dont do this in Postgres or another relational database - it will ruin the lives of anyone who needs to actually write the queries. — Randy, Jun 11 '15 at 20:54
Check out the `hstore` module. An extremely efficient key/value store as a datatype for a column: http://www.postgresql.org/docs/current/static/hstore.html — , Jun 11 '15 at 21:39

score 2 · Accepted Answer · answered Jun 12 '15 at 02:56

Classic SQL works better with static schema.

Still, it is possible to write a single query in your case.

For example, you want to find all people who have:

FavorateColor = red
AND
Fname = Ron
AND
FavorateIcream = Chocolate

Do three separate queries for each attribute and return only those IDs that match all three filters:

SELECT *
FROM PersonDetails
WHERE PersonID IN
    (
        SELECT ID
        FROM person
        WHERE AtrbLbl = 'FavorateColor' AND AtribVal = 'red'

        INTERSECT

        SELECT ID
        FROM person
        WHERE AtrbLbl = 'Fname' AND AtribVal = 'Ron'

        INTERSECT

        SELECT ID
        FROM person
        WHERE AtrbLbl = 'FavorateIcream' AND AtribVal = 'Chocolate'
    )

So, it is possible, but, I personally, would not do it. I would have separate tables for People, Addresses, Phones, Emails, just as you described in the beginning of your question.

INTERSECT was what I was missing, Thanks. Can anyone speak to the [dis]advantages of this method vs hstore vs properly normalized like Vladimir mentioned he would prefer? — Ron H, Jun 12 '15 at 14:17
How does do the various schemaless options perform? http://thebuild.com/presentations/pg-as-nosql-pgday-fosdem-2013.pdf — Ron H, Jun 12 '15 at 15:29

score 1 · Answer 2 · answered Jun 12 '15 at 02:50

An entity-attribute-value approach might work well for this case. More info here:

http://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_model

Here is a simplified example.

drop schema example;

create schema example;

use example;

create table attribute_type (
    type_code varchar(16) primary key
);

create table person (
    person_id int primary key,
    person_name varchar(64)
);

create table person_attribute_value (
    person_id int references person(person_id),
    attribute_type varchar(16) references attribute_type(type_code),
    string_value varchar(64)
);

insert into attribute_type values ('phone');
insert into attribute_type values ('email');
insert into attribute_type values ('snail_mail_addr1');
insert into attribute_type values ('snail_mail_addr2');
insert into attribute_type values ('snail_mail_city');
insert into attribute_type values ('snail_mail_state');
insert into attribute_type values ('snail_mail_zip');

insert into person values (1, 'Larry');
insert into person values (2, 'Moe');
insert into person values (3, 'Curly');

insert into person_attribute_value values(1, 'phone', '(860)555-1234');
insert into person_attribute_value values(2, 'phone', '(860)555-1234');
insert into person_attribute_value values(3, 'phone', '(860)555-1234');
insert into person_attribute_value values(2, 'snail_mail_addr1', '123 Evergreen Terrace');
insert into person_attribute_value values(2, 'snail_mail_city', 'Springfield');
insert into person_attribute_value values(2, 'snail_mail_state', 'MA');

select
    person.*,
    phone.string_value phone,
    addr1.string_value addr1,
    addr2.string_value addr2,
    city.string_value city,
    state.string_value state
from
    person
    left outer join person_attribute_value phone on person.person_id = phone.person_id and phone.attribute_type = 'phone'
    left outer join person_attribute_value addr1 on person.person_id = addr1.person_id and addr1.attribute_type = 'snail_mail_addr1'
    left outer join person_attribute_value addr2 on person.person_id = addr2.person_id and addr2.attribute_type = 'snail_mail_addr2'
    left outer join person_attribute_value city on person.person_id = city.person_id and city.attribute_type = 'snail_mail_city'
    left outer join person_attribute_value state on person.person_id = state.person_id and state.attribute_type = 'snail_mail_state'
;

A better way to store unknown number of attributes

2 Answers2