3

I am working on a website which will have gazillions of stories. Stories in all formats: texts, videos, photos and other multimedia elements. stories can be filtered on various basis some of which are "new" which obviously will contain latest stories first, "featured" stories which will be marked featured manually and "popular" for which I need to come up with an algorithm.

So far what I am doing is taking average of facebook likes, number of shares (including both facebook, twitter or any other shares) and number of views. But this doesn't look good to me. Because giving equal weight-age to all three metrics doesn't sound genuine for reasons like social spamming etc.

Looking forward to some really good algorithms to rank popularity of stories.

----Addition-----

Popularity Algorithm discusses algorithm only based on "likes" and the algorithm is based on to categorize results in categories of timestamps: popular on day, week and month. whereas This has an answer which nearly answers my query but not exactly because the metrics is assumed there. I am looking for some exact metric with genuine explanation. For eg "facebook *2", with an explanation of why *2 for facebook. I hope I am not duplicating now!


Community
  • 1
  • 1
Chandan Gupta
  • 1,410
  • 2
  • 13
  • 29
  • [Popularity Algorithm](http://stackoverflow.com/questions/1025436/popularity-algorithm) discusses algorithm only based on "likes" and the algorithm is based on to categorize results in categories of timestamps: popular on day, week and month. whereas [This](http://stackoverflow.com/questions/9570384/algorithm-for-ranking-popular-blog-posts) has an answer which nearly answers my query but not exactly because the metrics is assumed there. I am looking for some exact metric with genuine explanation. For eg "facebook *2", with an explanation of why *2 for facebook. I hope I am not duplicating now! – Chandan Gupta Mar 10 '14 at 07:12

1 Answers1

3

I'd suggest trying to use a regression algorithm. The most widely used is linear regression, but if that model does not fit - feel free to explore others.

  1. First, determine the features of each story. Your features are likes, tweets, shares, views, .... I'd also add a boolean indicator (variable that can be values 0 or 1 only) for each of the types (video/photo/...).
  2. Next, create yourself a training-set - which is a set of stories where you (or other human experts) have given a score to.
  3. Now, using these features and the training set - use some regression algorithm to create a model that best fits the features you have to the examples you already scored.1
  4. After you have a model - you can use it to give a score to all other documents.

Regarding spammers detection - you could try anomality detection algorithms


(1) Actually, step 2 and 3 can be done together - using active regression techniques - in active regression, the learner (algorithm) asks you for the examples that will make the algorithm learn as fast as it can. From my experiments PAlice is a very well performing active regression algorithm.

amit
  • 175,853
  • 27
  • 231
  • 333
  • But isn't it difficult for the humans to give a score manually to a post based on the number of likes and comments? I mean humans even cannot exactly say whether it's a popular post..right? – eshb Jul 26 '17 at 02:20