0

Context:

  • my user_feed table is stored in a MariaDB database
  • the table uses the InnoDBstorage engine

Clarity note: Throughout the question, whenever I use the term "user's feed" what I'm referring to is all the records in the user_feed table that have identical values set for the user_id field.

So initially, on login, the user gets the top 40 posts that have their user id as a foreign key in the user_feed table. The query gets the top 40 posts using an ORDER BY date_created clause. When the user scrolls down to, let's say post number 30, I want to query for the next 40 posts in their feed. Right now, I plan on using the date created of the last post the user has in the app, to determine what posts to get from the user_feed table.

My question is: If I set the date_created timestamp of a post when it is inserted into the user_feed table, is it possible that two posts for a particular user's feed will have the same timestamp?

user_feed table:

CREATE TABLE `user_feed` (
 `user_id` int(1) unsigned NOT NULL,
 `post_id` int(1) unsigned NOT NULL,
 `reposter_id` int(1) unsigned DEFAULT NULL,
 `date_created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
 PRIMARY KEY (`user_id`,`post_id`),
 KEY `user_id` (`user_id`),
 KEY `date_created` (`date_created`),
 KEY `post_id` (`post_id`),
 KEY `reposter_id` (`reposter_id`),
 CONSTRAINT `user_feed_ibfk_1` FOREIGN KEY (`post_id`) REFERENCES `post` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
 CONSTRAINT `user_feed_ibfk_2` FOREIGN KEY (`user_id`) REFERENCES `user` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=latin1

Update: So the complexity that it takes to use time as an indicating factor for determining uniqueness seems to be out of the question. The next option would be to have a column id that increments with each new post, so I can query for the next 40 records in the user_feed table that correspond to a particular user_id and have ids that are smaller than the 40th post's id that was received from the last query. However, there seem to be some issues with this approach as well:

One, a user's feed cannot have records with identical post_ids i.e. if you looked at a user's entire feed, you wouldn't see two records that corresponded to the same post. This means that whenever a repost is made, a deletion has to occur if a particular user's feed contains a record with the same post_id of the one that was reposted. Then, there will be an insertion of a new record that has the reposter_id field set. The other option would be to update the existing record by setting the reposter_id field to the id of the reposter and the date_created field to the date it was reposted. Using updates seems to be more efficient, but with a new auto_incrementing id column, I'd have to update the auto_incrementing column manually by getting the next possible auto_increment value and use it to update the id field.

The immediate problem I see with this is: What if while the new repost's id field is being updated another user creates a post meant for this user's feed and because the idcolumn doesn't need to be set for new posts manually (the id of a post that never existed before will never already be in the user_feed table) the creation of the new post record beats the update of the other record and has the same id that was retrieved for the update; leading to a primary key already exists exception.

The other issue that seems to exist with using a unique id column for the entire table is if a single post will have a unique id for each user's feed it's placed in -- fan out system for getting a user's feed, if you haven't noticed by now. And, a single post can be reposted millions of times, so each of those reposts will also have a unique id. It seems like the value for id column would increment too quickly, unless each user's feed had it's own auto_incrementing field i.e. to get the next highest id value for a user's feed I'd have to add 1 to the result of the following query:

.

SELECT MAX('id') FROM user_feed WHERE user_id = :(id of the user whose feed    
        is changing)

Any feedback on the two points above?

the beest
  • 463
  • 6
  • 26
  • It is very much possible, what if two users simultaneously inserts some data at the same time and you get the same time-stamp for both. Never rely on timestamps for uniqueness. – Sajib Acharya May 11 '16 at 20:31
  • @SajibAcharya, so with innodb, writes to tables are made asynchronously? – the beest May 11 '16 at 20:33
  • Does not matter if it is synchronous or asynchronous. Even if you do get a timestamp in milliseconds, mysql can complete multiple queries within a fraction of a milli second (it also depends on the hardware, obviously), so you may end up getting similar timestamps. Look [here](http://stackoverflow.com/questions/10320667/can-i-get-a-unique-timestamp-for-every-record-in-mysql), [here](http://stackoverflow.com/questions/26882872/is-a-timestamp-in-microseconds-always-unique) and [here](https://www.ibm.com/developerworks/community/blogs/SQLTips4DB2LUW/entry/current_timestamp?lang=en). – Sajib Acharya May 11 '16 at 20:40
  • @SajibAcharya i don't know what you're talking about. all of my edits have been grammatical or formating related, and have not changed the question fundamentally; even a little bit. check again – the beest May 11 '16 at 20:48
  • I am so sorry, I just posted that comment on a completely wrong question. That was not intended for your question, but for a different question altogether. I apologise for my carelessness. – Sajib Acharya May 11 '16 at 20:50
  • @SajibAcharya it's ok, i thought you might of – the beest May 11 '16 at 20:54
  • @SajibAcharya any feedback on the points addressed in the update? – the beest May 11 '16 at 22:29

1 Answers1

0

What a tangled web. Let's start with some principles...

  • Inserts, replacements, and deletes can occur while a client is scanning the list?

  • You need a unique key for each row in the table. This should be the PRIMARY KEY, and may as well be an id .. AUTO_INCREMENT.

  • You to fetch a range or rows ordered by time. Suggest INDEX(date_created, id). This is ordered, and has no dups (because of id). There is no need to say UNIQUE instead of INDEX. Do not use OFFSET.

  • You need to replace an existing item. For that, you need a unique key. It could be id, if you can hang on to the value until you need to do the replacement. Or you could have another column (or combination of columns) that are unique in order to determine which row to reinsert or replace or update. Note that you will probably want the old timestamp to be kept. You probably don't care if the id changes; in the rare case of a duplicate timestamp, the pair of items may be swapped.

Rick James
  • 135,179
  • 13
  • 127
  • 222