3

With this schema:

create table object (
   obj_id      serial      primary key,
   name        varchar(80) not null unique,
   description text,
   tag_arr     int[]
);

create table tag (
   tag_id      serial      primary key,
   label       varchar(20) not null unique
);

An object may have any number of tags attached. Instead of an object X tag table, I wish to keep tag_ids in an array so they can be easily fetched with the object record.

How do I create an index on object so that each element of tar_arr is an index?

That said, are there better ways to solve this problem?

Discussion

This could be achieved with:

create table obj_x_tag(
   obj_id    references object,
   tag_id    references tag,
   constraint obj_x_tag_pk primary key( obj_id, tag_id )
);

select obj_id, name, description, array_agg( tag_id )
from object o
join obj_x_tag x using( obj_id )
group by 1, 2;

But to me it makes more sense to simply keep the array of tag_ids in a column and dispense with the cross table and array_agg()

It was suggested to use PostgresQL SQL: Converting results to array. The problem, as noted, is that "this doesn't actually index individual array values, but instead indexes the entire array"

It was also suggested to use pg's intarr and gist (or gin) index. The problem - to me - seems that the index is for the standard pg set-based array operators, not necessarily optimized for finding one element of an array, but rather where one array contains another, intersects with another - for me it's counter-intuitive that, size-wise and speed-wise, such a wide solution is correct for such a narrow problem. Also, the intarr extension seems limited to int, not covering int64 or char, limiting its usefulness.

Community
  • 1
  • 1
cc young
  • 18,939
  • 31
  • 90
  • 148
  • You should use GIN check this out http://stackoverflow.com/questions/4058731/can-postgresql-index-array-columns – JoseP Jun 03 '12 at 03:15
  • Go for a normalized model and store that information in a table containing the `obj_id` and the `tag_id` that will save you a lot of trouble in the long run. Just think of a query that gives you the number how often a tag was used. –  Jun 03 '12 at 07:10
  • @a_horse_with_no_name _really_ don't want to get into a back and forth, but seems to me that, if each element of an array were indexed, then things like "a query that gives you the number how often a tag was used" would be the same cost as a cross table (which is really just an index table itself). – cc young Jun 03 '12 at 09:32
  • @JoseP - thanks for the link. addressed using GIN in added Discussion part of the question. – cc young Jun 03 '12 at 09:33

3 Answers3

7

You can create GIN indexes on any 1-dimensional array with standard Postgres.
Details in the manual here (last chapter).

While operating with integer arrays (plain int4, not int2 or int8 and no NULL values) the additional supplied module intarray provides a lot more operators and typically superior performance. Install it (once per database) with:

CREATE EXTENSION intarray;

You can create GIN or GIST indexes on integer arrays. There are examples in the manual.
CREATE EXTENSION requires PostgreSQL 9.1 or later. For older versions you need to run the supplied script.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • you are really good - thanks again. looked at `intarr` and do not know that it is correct - have placed my notes in the body of the question. – cc young Jun 03 '12 at 09:19
2

The traditional solution would be to use a table of tags and a many-many between tag and object. Then you can index the tag table and pull everything in a single select statement via a join. If you're not happy with the programming model, check out your local friendly ORM vendor.

I'm not a PostgreSQL expert by any means, but this doesn't seem like a good use case for arrays.

easel
  • 3,982
  • 26
  • 28
  • 1
    agree that a cross table between object and tag is the classical solution - indeed my current solution. but in this case a cross table is expensive and cumbersome for something very light weight. if arrays could be indexed, then seems to me it would be a good solution, despite the lack of RI. – cc young Jun 03 '12 at 05:00
  • @ccyoung: why is a cross table "expensive". Using tables to store data is what relational databases are made for. –  Jun 03 '12 at 07:08
  • @a_horse_with_no_name I am somewhat of a database bigot, as I suspect you probably are. but I also like many of non-classical elements such as nested tables and using `array_agg()` function to pull in using one row when the child rows are trivial. to me this makes sense and makes life easier. have expanded question to give better examples. – cc young Jun 03 '12 at 09:27
  • @ccyoung: just think of a query that shows the count how often each tag was used. That'll be utterly complex with an array but a very easy with a normalized model. Also finding objects that have two or three specific tags will be a lot easier as well. –  Jun 03 '12 at 09:31
  • @a_horse_with_no_name `select count(1) from obj where 123456 = ANY( tag_arr )` - if `tag_arr` elements are indexed, it would be efficient as well – cc young Jun 03 '12 at 09:50
  • @ccyoung: Now get that count for all tags (piece of cake using a group by). But apparently you do want to find a workaround for a problem that you wouldn't have if you used a normalized model. –  Jun 03 '12 at 09:54
  • @a_horse_with_no_name: The problem with many-many tables is per-row over head in most RDBMSes is very large. In Postgres, it's at minimum 24 bytes. The overhead on a 1D array is 8 bytes *for the entire array*. That makes a huge difference. Generating the complete set is easy using unnest(). – Jim Nasby May 13 '16 at 18:20
0

This is my workaround, because I see no PostgreSQL optimized internal function for do the same,

CREATE FUNCTION unnest_with_idx(anyarray) RETURNS 
table(idx integer, val anyelement) AS $$ 
   SELECT generate_series(1,array_upper($1,1)) as idx, unnest($1) as val;
$$ LANGUAGE SQL IMMUTABLE;
-- Test:
SELECT idx,val from unnest_with_idx(array[1,20,3,5]) as t;

For check if exists an internal function, see "How to acess array internal index with postgreSQL?" question.


Edited after @JimNasby comment

Solution for pg9.4+

SELECT * FROM unnest(array[20,11,3,5]) WITH ORDINALITY;

the WITH ORDINALITY produces a new column "ordinality" that is the array-index. See also this tutorial.

In pg9.5+, it works fine also for JSON arrays!

 SELECT * FROM jsonb_array_elements( '[20,11,3,5]'::JSONB ) WITH ORDINALITY
Community
  • 1
  • 1
Peter Krauss
  • 13,174
  • 24
  • 167
  • 304
  • You can now use the WITH ORDINALITY option: SELECT * FROM unnest(array) WITH ORDINALITY;. Not sure what version added that though. – Jim Nasby May 13 '16 at 18:17