2

To preface, I'm no DBA or SQL expert. But I've taken on a personal project that requires me to wear all hats in making a social network. (No, I'm not trying to reinvent Facebook. I'm targeting a niche audience.) And yes, I've heard of frameworks such as http://activitystrea.ms/, but I feel like data serialization should be a last resort for my needs.

Anyway, How to implement the activity stream in a social network helped me get the ball rolling, but I have some unanswered questions.

Below is my database schema (some rows have been omitted for simplification):

Action table:

id    name
-------------
1     post
2     like
3     follow
4     favorite
5     tag
6     share

Activity table:

id             (int)
user_id        (int)
action_id      (tinyint)
target_id      (int)
object_id      (tinyint)
date_created   (datetime)

The object_id refers to which object type the target_id is. The idea here is to represent (User + Action + Target Object)

  • User Post(s) Media
  • User Favorite(s) Scene
  • User Follow(s) User

Object (type) table:

id    name
-------------
1     media
2     scene
3     brand
4     event
5     user

The problem here is that each object has its own separate table. For example:

Media table:

id            (int)
type          (int)
thumbnail     (varchar)
source        (varchar)
description   (varchar)

Event table:

id        (int)
user_id   (int)
name      (varchar)
city      (int)
address   (varchar)
starts    (time)
ends      (time)
about     (varchar)

User table:

id                (int)
username          (varchar)
profile_picture   (varchar)
location          (int)

What, then, would be the best (i.e., most efficient) way of querying this database?

Obviously I could perform a SELECT statement on the activity table, and then – based on the object_id – use conditional logic in PHP to make a separate query to the appropriate object's table (e.g., media).

Or would it be smarter (and more efficient) to implement some sort of left or inner JOIN on all 5 object tables, as suggested here: MySQL if statement conditional join. I'm not entirely familiar with how JOINS work, and whether SQL is smart enough to only scan the appropriate object table for each activity row, rather than ALL the joined tables.

Of course the first solution means MANY more calls to the database, which is less desirable. However, I'm not sure how else I could retrieve all the relevant columns (e.g., media "source", event "address") in just one query without implementing some conditional logic.

Community
  • 1
  • 1
ephankoral
  • 23
  • 1
  • 4
  • what if you make a join table for every object type as you do in a many to many relationship. with this mechanism in place, you could join everything in one query without conditional joins and just get the values that are actually related to each other. – Raphael Müller Mar 13 '15 at 08:32
  • @RaphaelMüller Sounds like a potential solution. Could you show me what that might look like exactly (i.e., a database schema)? Like I said, I'm no SQL expert so I'm having trouble picturing it. – ephankoral Mar 13 '15 at 09:24
  • I just thought of another possible solution.. After I make a standard query to the activity table, I could loop over the activity rows to create an array for each object type with their target_id(s). Then I could query each object table like so (select * from media where id in (1, 2, 3, 4...)), and use conditional logic with php to display the results. This way, I only make a maximum of 6 queries (assuming there are 5 objects). This is the easiest solution I've come up with, but is it still too complicated? – ephankoral Mar 13 '15 at 09:50
  • this solution with the logic in php was my second thought too ;) I use something similar to compose dynamic content with modules (where every module has its own template with the coresponding table behind) – Raphael Müller Mar 13 '15 at 09:56

2 Answers2

5

Suppose, you change your activity table a little bit:

Activity table:

id             (int)
user_id        (int)
action_id      (tinyint)
object_id      (tinyint)
date_created   (datetime)

and your join table for every target type:

activity_id    (int)
target_id      (int)

and finally your target table (media)

id            (int)
type          (int)
thumbnail     (varchar)
source        (varchar)
description   (varchar)

and target table (event)

id        (int)
user_id   (int)
name      (varchar)
city      (int)
address   (varchar)
starts    (time)
ends      (time)
about     (varchar)

now, you can select the data with

SELECT
 activity.id,
 activity.user_id,
 activity.action_id,
 action.name,
 activity.object_id,
 object.name,
 media.id as media_id,
 media.type,
 media.thumbnail,
 media.source,
 media.description,
 event.id as event_id,
 event.name,
 ...
FROM
 activity
 LEFT JOIN action ON (action.id = activity.action_id)
 INNER JOIN mediaToActivity ON (mediaToActivity.activity_id = activity.id)
 LEFT JOIN media ON (media.id = mediaToActivity.target_id)
 INNER JOIN eventToActivity ON (eventToActivity.activity_id = activity.id)
 LEFT JOIN event ON (event.id = eventToActivity.target_id)

with this query you should get all rows in one query (but only the ones which actually exists are filled with data)

Note, I haven't tested this by now...

Raphael Müller
  • 2,180
  • 2
  • 15
  • 20
  • Interesting.. Thanks for the explanation! Is this solution only possible with separate join tables for every target type, or could the same query be accomplished with the original schema? – ephankoral Mar 13 '15 at 10:00
  • if you do a plain `LEFT JOIN` on your old layout it should also work. the corresponding ones should then turn out as `NULL` – Raphael Müller Mar 13 '15 at 10:01
  • Ahh, I see. I'm going to have to test these solutions tomorrow and report back. Looking for the most efficient way possible! – ephankoral Mar 13 '15 at 10:05
  • I just noticed, that if you use the old layout, you have to make a restriction in your `JOIN` or your target_id have to be unique over all tables. e.g. your `JOIN` could look like this: `LEFT JOIN event ON (event.id = activity.target_id AND activity.object_id = 4)` – Raphael Müller Mar 13 '15 at 10:06
  • I remember considering that problem before, and I intended to make the `target_id`'s unique. But that restriction seems like a much better solution! – ephankoral Mar 13 '15 at 21:49
  • 1
    I tested this solution (with the original schema – no separate join tables), and it does exactly what I need. Obviously there are a lot of `NULL` values in each row, but I don't see how that could be avoided. (I'm not sure if that even matters much when it comes to performance though?) Anyway, thank you. This seems to be the simplest solution, but I welcome suggestions from anyone else! – ephankoral Mar 14 '15 at 08:59
0

I pieced together from your discussion what your solution was. Fiddle

create table activity (
  id            int,   
  user_id       int,
  action_id     int,
  target_id     int,
  object_id     int,
  date_created  datetime
);
create table action (
  id int,
  name varchar(80)
);
create table object (
  id int,
  name varchar(80)
);
create table media (
  id int,
  type int,
  thumbnail varchar(255),
  source varchar(255),
  description varchar(255)
);
create table event (
  id       int,
  user_id   int,
  name      varchar(255),
  city      int,
  address   varchar(255),
  starts    time,
  ends      time,
  about     varchar(255)
);

-- setup
insert into action values (1, "post");
insert into object values (1, "media");
insert into object values (2, "event");

-- new event
insert into event values (1, null, "breakfast", null, "123 main st", null, null, "we will eat");
insert into activity values (1, null, 1, 1, 2,  null);

-- new media
insert into media values (1, null, null, null, "new media");
insert into activity values (2, null, 1, 1, 1,  null);

SELECT *
FROM
 activity
 left join event on (event.id = activity.target_id and activity.object_id = 2)
 left join media on (media.id = activity.target_id and activity.object_id = 1);
Harry Moreno
  • 10,231
  • 7
  • 64
  • 116