1

I need search functionality on a website where among other things you should be able to select multiple categories. The searches will be stored in the database but each unique combination of search parameters should only be stored once, this also includes the unique combination of selected categories.

The problem is that I cannot figure out how to store the combinations of selected categories. I have looked at arrays and found this http://blog.2ndquadrant.com/postgresql-9-3-development-array-element-foreign-keys/ but apparently that feature was never implemented.

So if I need to use multiple tables I was thinking I could have one table for the combinations that has an id for each unique combination that can be referenced and compared easily and then another table linking the categories to the combination. But with this approach how can I check if a combination already exists? The only way I can think of is to loop through all existing combinations and check if any one of those equal the compared combination.

I feel like this cannot be such an uncommon problem but I can't find any examples that do this. I also feel like my approach may not be the best. Any suggestions are very much welcome.

I have these two tables currently:

Categories
- CategoryId (int)
- Name (string)

Searches
- SearchId (int)
- Keywords (string)
- ExampleOption1 (bool)
- ExampleOption2 (bool)
- CategoriesCombinationId (int) -- this would represent the unique combination of categories and links to the combination table

And this is how I might try to solve the problem (if there is a good way to check if a combination already exists):

CategoriesCombinations -- unique combinations
- CombinationId (int)

CombinedCategories
- CombinationId (int) -- links to id in combinations table
- CategoryId (int) -- links to id in categories table
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
trist
  • 35
  • 1
  • 4

2 Answers2

0

For your current approach you can use string_agg to have a string representation of all Categories in each CategoriesCombinations and check that against the new search:

SELECT CombinationId
FROM CombinedCategories
WHERE string_agg(CategoryId, ',') = '84,95,102'
GROUP BY CombinationId

But the simpler approach would be calculating a unique hash for each search based on all parameters and store that in Searches table and compare hash of the new search against search history.

Amir Rahimi Farahani
  • 1,580
  • 1
  • 12
  • 14
0

Store as array (denormalized)

I would consider the additional module intarray that provides the convenient (and fast) functions uniq() and sort(). In a typical modern Postgres installation it's as easy as:

CREATE EXTENSION intarray;

Using these, a simple CHECK constraint can enforce ascending arrays with distinct elements.

CHECK (uniq(sort(cat_arr)) = cat_arr)

You can additionally (optionally) have a trigger that normalizes array values ON INSERT OR UPDATE automatically. Then you can just pass any array (possibly unsorted and with dupes) and everything just works. Like:

CREATE OR REPLACE FUNCTION trg_search_insup_bef()
  RETURNS trigger AS
$func$
BEGIN
   NEW.cat_arr := uniq(sort(NEW.cat_arr);
   RETURN NEW;
END
$func$ LANGUAGE plpgsql;

CREATE TRIGGER insup_bef
BEFORE INSERT OR UPDATE OF cat_arr ON search
FOR EACH ROW
EXECUTE PROCEDURE trg_search_insup_bef();

The additional module intarray is optional, there are other ways:

But the intarray functions deliver superior performance.

Then you can just create a UNIQUE constraint on the array column to enforce uniqueness of the whole array.

UNIQUE (cat_arr)

I wrote more about the advantages of combining (very strict and reliable) constraints with (less reliable but more convenient) triggers in this related answer just two days ago:

If, for each combination, all you need to store per category is the ID (and no additional info), this should be good enough.
However, referential integrity is not easily ensured this way. There are no foreign key constraints for array elements (yet) - like documented in your link: If one of the categories is deleted or you change IDs, references break ...

Normalized schema

If you need to store more or you'd rather go with a normalized schema to enforce referential integrity or for some reason, you can do that, too, and add a trigger to populate a hand-made materialized view (a redundant table) and enforce uniqueness in a similar way:

CREATE TABLE search (
  search_id serial PRIMARY KEY
, ... more columns
);

CREATE TABLE cat (
  cat_id serial PRIMARY KEY
, cat text NOT NULL
);

CREATE TABLE search_cat (
  search_id int REFERENCES search ON DELETE CASCADE
, cat_id    int REFERENCES cat
, PRIMARY KEY (search_id, cat_id)
);

Related answer (not for unique combinations, but for unique elements) that demonstrates the trigger:

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • This would work but I will skip the module and sort the array and remove duplicates in the application before the comparison unless there are some hidden advantages in using it. – trist Apr 19 '15 at 19:21
  • @trist: The advantage of intarray functions is mainly performance. With the `CHECK` in place you can rely on consistent data in the column at all times and, combined with the `UNIQUE` constraint, across the table as well. If you implement constraints in your application only, reliability is limited to exclusive access through the application (and even then it's far less reliable). An RDBMS is optimized for jobs like that. Application code cannot compete with that. I added more pointers and links. – Erwin Brandstetter Apr 19 '15 at 21:45
  • Just to clarify how this solves my problem. I realise your answer may actually be two possible solutions but I intend to use both at the same time. The array allows me to easily check for uniqueness and the table stores the list of categories with foreign key constraints. Since the array is only used for uniqueness it doesn't matter if a category is removed with its id remaining in the array because the table is always used to get the selected categories. A search that included a removed category is still going to be unique even though that category cannot be selected again. – trist Apr 20 '15 at 13:48