1

I'm looking for a simple way to store and retrieve a count for however many times a user views a unique post/page. ex post/1 post/2. And I'm also interested in a total view count of a post.

How I think I should implement this:

  1. Create userpostview table with columns (userid,postid,datecreated)
  2. Create row of userpostview every time with userid,postid if record does not exist
  3. Retrieve total count of rows where userid == X
  4. Retrieve total count of rows where postid == Y

For a small scale app with a few rows is okay. But I'm anticipating a high percentage of different users reading different posts per day. Therefore, the userpostview table would be much greater than that of either user table or post table. In the long run, I believe this will be a performance problem, correct? For instance calling on counts continuously could slow down the read. I'm wondering if caching the count would be ideal in this situation. Or should there be a separate table to hold a incremental counter?

I'm using flask with a sqlalchemy backend to postgresql. Is this sufficient?

However after doing a bit of research, I saw this:

https://www.youtube.com/watch?v=G-lGCC4KKok

http://highscalability.com/blog/2012/3/26/7-years-of-youtube-scalability-lessons-in-30-minutes.html

Should I be "faking" the counter? Is there a better approach? Any better or improved suggestions are welcomed.

Thanks in advance.

user805981
  • 9,979
  • 8
  • 44
  • 64

1 Answers1

1

Are you logging user interactions with your application?

Logging can be a great way to store data like this. Just setup a simple logger in your Flask application that creates a log each time a user loads a post page. Save the user_id, post_id, and timestamp to the log file. (Note: if you are worried about log files getting too large, you can configure the logger to start a new file every day).

After you setup logging, you can use utilities to parse your logs and give you aggregated results like the ones you asked for. Or you can even parse the logs with Python and run custom "queries".

I recommend using the ELK stack. Logstash, Elasticsearch, and Kibana are free and open source projects. Within an hour you could have beautiful charts and graphs of a user's activity on your site.

Community
  • 1
  • 1
nivix zixer
  • 1,611
  • 1
  • 13
  • 19
  • Hmm.. this looks interesting. I'm not necessarily interested in the charts or graphs at the moment. How does my main app/api get the counts? What is the benefit of ELK stack over adding new rows into a postgresql database? – user805981 May 23 '15 at 03:58
  • Oh! I didn't know you wanted the counts to be part of your application data. I thought you wanted to see user activity for sys-ops tracking and such. If you want your application to use the userpostviews data, then it must be in the database. I recommend Postgres partitioning table. https://blog.engineyard.com/2013/scaling-postgresql-performance-table-partitioning – nivix zixer May 23 '15 at 12:55
  • This looks more like what I'm looking for. What are your thoughts of using a nosql database alongside a postgres, where nosql database handles the view counter record? – user805981 May 23 '15 at 14:35
  • It could increase performance if you use something efficient like Redis. But it really depends on how much traffic you are expecting. Have you tried implementing a postgres solution first and testing for bottlenecks? – nivix zixer May 23 '15 at 14:53
  • Do you have a small example of using Redis in this particular way? I'm currently already using rabbitmq to do asynchronous tasks. Would it be smartter to use celery and rabbitmq to do the insert and then use a caching mechanism to pull on the counts? – user805981 May 23 '15 at 15:08