2

I am building a blog aggregator like Techmeme that finds most popular posts from several blogs. Unlike Techmeme, first, I aggregate blog posts from a variety of RSS feeds, then save the headlines and relevant URLs in database. After that, I have to find what the most popular blog posts are.

For defining top blog post headlines, I track Facebook and Twitter share counts for every post of every blog and I rank the blog posts for their share counts. But that isn't the best solution because some bloggers can cheat via increasing their sharing counts with fraudulent shares.

So my question is what criterias could I use to define what the most popular posts are? What would be a better algorithm for ranking blog posts?

Community
  • 1
  • 1
  • Google Trends gives a daily unique visitor count. However it doesn't look like there is any kind of official api for it. Not really sure how well it would work with blog posts, since I figure they likely aren't navigated to from a google search. http://trends.google.com/websites – Danny Mar 05 '12 at 16:50
  • but there isn't data for all blogs or blog posts. there is only for globally popular ones. since my project is local, not global, this tool doesn't help me :( –  Mar 05 '12 at 16:56

3 Answers3

2

Since the term 'popular' in this context is vague I would define the popularity of posts according to my criterias. Combine all suggested answers and make a reasonable reputation system for the blog posts. For instance, basically I would do something like this.

  • facebook share x 2
  • twitter share x 3
  • pagerank of the domain x 2
  • 50 000 / global alexa rating
  • and so on

Finally, you may sum up all these and compare. Moreover, you can develop some criterias take into account of size of size of posts, number of images within the post, etc.

seferov
  • 4,111
  • 3
  • 37
  • 75
  • How do you decide the multiplicative factor for shares/likes etc. I mean, why (Facebook share x 2) and not (Facebook share x 30) – Jayesh Dec 22 '13 at 08:54
  • @Jayesh I just made them up for the sake of the example. it is up to you (the importance you give to) – seferov Dec 22 '13 at 21:59
  • Thanks @Ferhad I just wanted to understand what's the process to get there. Is it always that you start with random weights and try to adjust the error over a period or is there any definitive way to get these? – Jayesh Dec 23 '13 at 04:36
0

It may be possible to estimate the joint distribution of shares across different sources. It's hard to detect fraudulence for marginalized (i.e. single) metrics, but it's harder to fake a holistic "organic" profile.

phs
  • 10,687
  • 4
  • 58
  • 84
0

How about using variation of PageRank?

here is the more details. http://pr.efactory.de/e-pagerank-algorithm.shtml http://en.wikipedia.org/wiki/PageRank?PHPSESSID=e371f8cacb91eff0c852a0e001893a9a

Andrew
  • 7,619
  • 13
  • 63
  • 117