0

Here's the easiest way I can think of to explain this. Imagine a user wants to bookmark a bunch of webpages. There's a url table with a UrlID and the actual url. I'd like the user to have a list of UrlIDs which are unique (but I don't need the constraint) and a 32bit int value such as an epoch date. The only two things I care about is 1) being to check if UrlID is in this list or not and 2) get the entire list and sort it by date (or second value)

If it helps I'm expecting no more than 8K bookmarks but most likely it will be <128

  • 1
    Why not use another table? – Sami Kuhmonen Sep 13 '16 at 10:32
  • @SamiKuhmonen: I'll have to put an index on the user column and it feels weird to repeat the UserID so many times. I was using an int array but wanted the second value and wasn't sure how to search every other element –  Sep 13 '16 at 10:34
  • @acidzombie24 You do need a separate table, with columns `UserID integer` and `urlID integer`, and with a unique index on `(UserID,urlID)`. There's nothing wrong with repeating the `UserID`, it's just an integer, not a string. Alternatively, you can put the `UserID` in the url table if you know that each url will be bookmarked by exactly one user. – redneb Sep 13 '16 at 10:43
  • @redneb: I'm reading about "Composite Types". Can I have an array of that and say `where val = ANY(user.bookmark)`. –  Sep 13 '16 at 10:46
  • @acidzombie24 Yes you can do that, but I think it will be slower and is less elegant. – redneb Sep 13 '16 at 10:47
  • @redneb: Why is that? I'm always pulling the entire list and now that I think about it I'll be comparing many values so I can't think of a time I won't need to access the entire list –  Sep 13 '16 at 10:58
  • ATM I can search (1,2)::mytype in the array but IDK how to search just the first value –  Sep 13 '16 at 11:01
  • 2
    "*and it feels weird to repeat the UserID so many times*" - that's how relational databases and normalization work. Nothing weird about that. –  Sep 13 '16 at 11:17
  • @a_horse_with_no_name: but... why would I ever want an array (which postgresql supports). Why not have every element it's own row? –  Sep 13 '16 at 11:26
  • @acidzombie24 A fairly common wisdom, which I don't entirely agree with, is that they exist to replace situations when you'd be naming columns like `foo1, foo2, foo3` or similar. (There are also times when using an array would be faster and that could happen at an optimization stage.) It really sounds like your particular use case works pretty well with a traditional relational design, so I'd really consider just using that. Postgres supports many other use patterns and I don't have a problem using them, but it should usually be done very intentionally. – Mike Graham Sep 13 '16 at 11:48
  • [You are not the first one stuck with this idea.](http://stackoverflow.com/questions/8016776/can-postgresql-have-a-uniqueness-constraint-on-array-elements/8017013#8017013) Typically, it's best to implement a standard [many-to-many relationship](http://stackoverflow.com/questions/9789736/how-to-implement-a-many-to-many-relationship-in-postgresql/9790225#9790225) – Erwin Brandstetter Sep 13 '16 at 12:49
  • @ErwinBrandstetter: I'm not too concerned about uniqueness ATM. My main concern is having a small size since this data will expire after a few weeks. If the array will be no more than 2K elements per user it seems like the best idea is to have it all in one array. A array that probably will always fit in one page –  Sep 13 '16 at 14:18

1 Answers1

0

If you really want to avoid the extra table to express the relationship, you can do something like that:

CREATE TABLE "user" (
    id integer primary key,
    name text not null,
    bookmarks integer[] not null
);
CREATE TABLE url (
    id integer primary key,
    time timestamp with time zone not null,
    val text not null
);

Then finding all bookmarks for a particular user (say with id 66) would involve doing something like that:

SELECT url,time
FROM (SELECT bookmarks FROM "user" WHERE id=66) u
     JOIN url ON url.id=ANY(bookmarks)
ORDER BY TIME;

Now here's why I don't like this schema. First, adding a new bookmark would require to rewrite the bookmarks array and hence the entire user row (so adding n bookmarks, one after the other, would require Θ(n^2) time). Secondly, you cannot use foreign keys on the elements of the array. Thridly, many queries will become more complicated to write, e.g. in order to retrieve all bookmarks for all users, you have to do something like that:

SELECT "user".id,"user".name,url.val,url.time
FROM "user",
     LATERAL unnest((SELECT bookmarks)) b
     LEFT JOIN url ON b = url.id;

Edit: So here's the schema I would use and which I think fits best with the relational paradigm

CREATE TABLE "user" (
    id integer primary key,
    name text not null
);
CREATE TABLE url (
    id integer primary key,
    val text not null
);
CREATE TABLE bookmark (
    user_id integer not null REFERENCES "user",
    url_id integer REFERENCES url,
    time timestamp with time zone not null,
    UNIQUE (user_id,url_id)
);
redneb
  • 21,794
  • 6
  • 42
  • 54
  • Hey sorry if I wasn't clear. The time would be when the user was bookmarked it not when the url was first inserted into the database. Obviously with my real data is not really timestamps and strings. I was looking to search 5 in an array of pairs such as `(12, 454545), (5, 654454))`. I only want it to match the first value of the pair. I may have a filter on the second (ie < or > than some date). I'm writing a new question for the solution I have in mind –  Sep 13 '16 at 11:21
  • You mean that the `bookmarks` column should be an array of tuples (specifically pairs)? – redneb Sep 13 '16 at 11:24
  • Yeah I wrote [a question here](http://stackoverflow.com/questions/39469098/how-do-i-write-a-where-statement-a-composite-type) but if it fails I'll use this answer. I need to reread the postgresql manual I haven't yet read the whole thing –  Sep 13 '16 at 11:40