0

I'm developing an application with SQLAlchemy and PostgreSQL. Users of the system modify data in 8 or so tables. Consider this contrived example schema:

enter image description here

I want to add visible logging to the system to record what has changed, but not necessarily how it has changed. For example: "User A modified product Foo", "User A added user B" or "User C purchased product Bar". So basically I want to store:

  1. Who made the change
  2. A message describing the change
  3. Enough information to reference the object that changed, e.g. the product_id and customer_id when an order is placed, so the user can click through to that entity

I want to show each user a list of recent and relevant changes when they log in to the application (a bit like the main timeline in Facebook etc). And I want to store subscriptions, so that users can subscribe to changes, e.g. "tell me when product X is modified", or "tell me when any products in store S are modified".

I have seen the audit trigger recipe, but I'm not sure it's what I want. That audit trigger might do a good job of recording changes, but how can I quickly filter it to show recent, relevant changes to the user? Options that I'm considering:

  • Have one column per ID type in the log and subscription tables, with an index on each column
  • Use full text search, combining the ID types as a tsvector
  • Use an hstore or json column for the IDs, and index the contents somehow
  • Store references as URIs (strings) without an index, and walk over the logs in reverse date order, using application logic to filter by URI

Any insights appreciated :)

Edit It seems what I'm talking about it an activity stream. The suggestion in this answer to filter by time first is sounding pretty good.

Community
  • 1
  • 1
z0r
  • 8,185
  • 4
  • 64
  • 83
  • Pavel's solution directly shows the changes that were made to the data in a hstore column: http://okbob.blogspot.co.uk/2015/01/most-simply-implementation-of-history.html –  Oct 30 '15 at 07:41
  • Thanks, but that doesn't address filtering by relevance. Is there a trick for indexing an hstore in a flexible way? – z0r Oct 30 '15 at 09:22

1 Answers1

1

Since the objects all use uuid for the id field, I think I'll create the activity table like this:

  • Have a generic reference to the target object, with a uuid column with no foreign key, and an enum column specifying the type of object it refers to.
  • Have an array column that stores generic uuids (maybe as text[]) of the target object and its parents (e.g. parent categories, store and organisation), and search the array for marching subscriptions. That way a subscription for a parent category can match a child in one step (denormalised).
  • Put a btree index on the date column, and (maybe) a GIN index on the array UUID column.

Entity diagram of activity and subscription tables

I'll probably filter by time first to reduce the amount of searching required. Later, if needed, I'll look at using GIN to index the array column (this partially answers my question "Is there a trick for indexing an hstore in a flexible way?")

Update this is working well. The SQL to fetch a timeline looks something like this:

SELECT *
FROM (
    SELECT DISTINCT ON (activity.created, activity.id)
        *
    FROM activity
        LEFT OUTER JOIN unnest(activity.object_ref) WITH ORDINALITY AS act_ref
            ON true
        LEFT OUTER JOIN subscription
            ON subscription.object_id = act_ref.act_ref
    WHERE activity.created BETWEEN :lower_date AND :upper_date
        AND subscription.user_id = :user_id
    ORDER BY activity.created DESC,
        activity.id,
        act_ref.ordinality DESC
) AS sub 
    WHERE sub.subscribed = true;

Joining with unnest(...) WITH ORDINALITY, ordering by ordinality, and selecting distinct on the activity ID filters out activities that have been unsubscribed from at a deeper level. If you don't need to do that, then you could avoid the unnest and just use the array containment @> operator, and no subquery:

SELECT *
FROM activity
    JOIN subscription ON activity.object_ref @> subscription.object_id
WHERE subscription.user_id = :user_id
    AND activity.created BETWEEN :lower_date AND :upper_date
ORDER BY activity.created DESC;

You could also join with the other object tables to get the object titles - but instead, I decided to add a title column to the activity table. This is denormalised, but it doesn't require a complex join with many tables, and it tolerates objects being deleted (which might be the action that triggered the activity logging).

z0r
  • 8,185
  • 4
  • 64
  • 83