1

Here is my current structure:

// posts
+----+--------+----------+-----------+------------+
| id | title  | content  | author_id | date_time  |
+----+--------+----------+-----------+------------+
| 1  | title1 | content1 | 435       | 1468111492 |
| 2  | title2 | content2 | 657       | 1468113910 |
| 3  | title3 | content3 | 712       | 1468113791 |
+----+--------+----------+-----------+------------+

// viewed
+----+---------------+---------+------------+
| id | user_id_or_ip | post_id | date_tiem  |
+----+---------------+---------+------------+
| 1  | 324           | 1       | 1468111493 |
| 2  | 546           | 3       | 1468111661 |
| 3  | 135.54.12.1   | 1       | 1468111691 |
| 5  | 75            | 1       | 1468112342 |
| 6  | 56.26.32.1    | 2       | 1468113190 |
| 7  | 56.26.32.1    | 3       | 1468113194 |
| 5  | 75            | 2       | 1468112612 |
+----+---------------+---------+------------+

Here is my query:

SELECT p.*,
       (SELECT count(*) FROM viewed WHERE post_id = :id) AS total_viewed
 FROM posts p
WHERE id = :id

Currently I've faced with a huge date for viewed table. Well what's wrong with my table structure (or database design)? In other word how can I improve it?

A website like stackoverflow has almost 12 million posts. Each post has (on average) 500 viewed. So the number of viewed's rows should be:

12000000 * 500 = 6,000,000,000 rows

Hah :-) .. Honestly I cannot even read that number (btw that number will grow up per sec). Well how stackoverflow handles the number of viewed for each post? Will it always calculate count(*) from viewed per post showing?

Martin AJ
  • 6,261
  • 8
  • 53
  • 111
  • Good question, what you need to have a look at is in-memory storage (Redis) and count(*) is not calculated every time someone opens a post instead a key related to this in-memory is incremented and later on added to the table every interval of time – Yehia Awad Jul 10 '16 at 01:09
  • @YehiaAwad So I need a no sql database like Redis to handle that? – Martin AJ Jul 10 '16 at 01:15
  • A question can have an int, like, view count. An action on a question like a vote of some type can be in an intersect or [Junction Table](http://stackoverflow.com/a/32620163). 900M people can view this question (I hope they don't), and it could increment a view count int along the way. No rows necessarily added. Then there is partitioning. – Drew Jul 10 '16 at 02:00

1 Answers1

0

You are not likely to need partitioning, redis, nosql, etc, until you have many millions of rows. Meanwhile, let's see what we can do with what you do have.

Let's start by dissecting your query. I see WHERE id=... but no LIMIT or ORDER BY. Let's add to your table

INDEX(id, timestamp)

and use

WHERE id = :id
ORDER BY timestamp DESC
LIMIT 10

Any index is sorted by what is indexed. That is the 10 rows you are looking for are adjacent to each other. Even if the data is pushed out of cached, there will probably be only one block to provide those 10 rows.

But a "row" in a secondary index in InnoDB does not contain the data to satisfy SELECT *. The index "row" contains a pointer to the actual 'data' row. So, there will be 10 lookups to get them.

As for view count, let's implement that a different way:

CREATE TABLE ViewCounts (
    post_id ...,
    ct MEDIUMINT UNSIGNED NOT NULL,
    PRIMARY KEY post_id
) ENGINE=InnoDB;

Now, given a post_id, it is very efficient to drill down the BTree to find the count. JOINing this table to the other, we get the individual counts with another 10 lookups.

So, you say, "why not put them in the same table"? The reason is that ViewCounts is changing so frequently that those actions will clash with other activity on Postings. It is better to keep them separate.

Even though we hit a couple dozen blocks, that is not bad compared to scanning millions of rows. And, this kind of data is somewhat "cacheable". Recent postings are more frequently accessed. Popular users are more frequently accessed. So, 100GB of data can be adequately cached in 10GB of RAM. Scaling is all about "counting the disk hits".

Rick James
  • 135,179
  • 13
  • 127
  • 222