35

I have been reading + researching on algorithms and formulas to work out a score for my user submitted content to display currently hot / trending items higher up the list, however i'll admit i'm a little over my head here.

I'll give some background on what i'm after... users upload audio to my site, audios have several actions:

  • Played
  • Downloaded
  • Liked
  • Favorited

Ideally i want an algorithm where I can update an audios score each time a new activity is logged (played, download etc...), also a download action is worth more than a play, like more than a download and a favourite more than a like.

If possible i would like for audios older than 1 week to drop off quite sharply from the list to give newer content more of a chance of trending.

I have read about reddits algorithm which looked good, but i'm in over my head on how to tweak it to make use of my multiple variables, and to drop off older articles after around 7 days.

Some articles that we're interesting:

Any help is appreciated!

Paul

Zeta
  • 103,620
  • 13
  • 194
  • 236
Paul Hinett
  • 1,951
  • 2
  • 26
  • 40
  • Hot content with time decay? Are you talking about the [heat equation](http://en.wikipedia.org/wiki/Heat_equation)? Nah, seriously, you have to think about this yourself - although the heat equation could give you some ideas. – Zeta Jul 25 '12 at 15:50
  • well the question is really what sort of equation am i looking for, I suppose Zeta answered this. I'm not exactly hot on maths and equations at this level, I was hoping someone had any experience with this before and found some useful blogs etc. – Paul Hinett Jul 25 '12 at 16:26

1 Answers1

68

Reddits old formula and a little drop off

Basically you can use Reddit's formula. Since your system only supports upvotes you could weight them, resulting in something like this:

def hotness(track)
    s = track.playedCount
    s = s + 2*track.downloadCount
    s = s + 3*track.likeCount
    s = s + 4*track.favCount
    baseScore = log(max(s,1))

    timeDiff = (now - track.uploaded).toWeeks

    if(timeDiff > 1)
        x = timeDiff - 1
        baseScore = baseScore * exp(-8*x*x)

    return baseScore

The factor exp(-8*x*x) will give you your desired drop off:

Exponential drop off

The basics behind

You can use any function that goes to zero faster than your score goes up. Since we use log on our score, even a linear function can get multiplied (as long as your score doesn't grow exponentially).

So all you need is a function that returns 1 as long as you don't want to modify the score, and drops afterwards. Our example above forms that function:

multiplier(x) = x > 1 ? exp(-8*x*x) : 1

You can vary the multiplier if you want less steep curves. varying multiplier

Example in C++

Lets say that the probability for a given track to be played in a given hour is 50%, download 10%, like 1% and favorite 0.1%. Then the following C++ program will give you an estimate for your scores behavior:

#include <iostream>
#include <fstream>
#include <random>
#include <ctime>
#include <cmath>

struct track{
    track() : uploadTime(0),playCount(0),downCount(0),likeCount(0),faveCount(0){}
    std::time_t uploadTime;    
    unsigned int playCount;
    unsigned int downCount;
    unsigned int likeCount;
    unsigned int faveCount;    
    void addPlay(unsigned int n = 1){ playCount += n;}
    void addDown(unsigned int n = 1){ downCount += n;}
    void addLike(unsigned int n = 1){ likeCount += n;}
    void addFave(unsigned int n = 1){ faveCount += n;}
    unsigned int baseScore(){
        return  playCount +
            2 * downCount +
            3 * likeCount +
            4 * faveCount;
    }
};

int main(){
    track test;
    const unsigned int dayLength = 24 * 3600;
    const unsigned int weekLength = dayLength * 7;    

    std::mt19937 gen(std::time(0));
    std::bernoulli_distribution playProb(0.5);
    std::bernoulli_distribution downProb(0.1);
    std::bernoulli_distribution likeProb(0.01);
    std::bernoulli_distribution faveProb(0.001);

    std::ofstream fakeRecord("fakeRecord.dat");
    std::ofstream fakeRecordDecay("fakeRecordDecay.dat");
    for(unsigned int i = 0; i < weekLength * 3; i += 3600){
        test.addPlay(playProb(gen));
        test.addDown(downProb(gen));
        test.addLike(likeProb(gen));
        test.addFave(faveProb(gen));    

        double baseScore = std::log(std::max<unsigned int>(1,test.baseScore()));
        double timePoint = static_cast<double>(i)/weekLength;        

        fakeRecord << timePoint << " " << baseScore << std::endl;
        if(timePoint > 1){
            double x = timePoint - 1;
            fakeRecordDecay << timePoint << " " << (baseScore * std::exp(-8*x*x)) << std::endl;
        }
        else
            fakeRecordDecay << timePoint << " " << baseScore << std::endl;
    }
    return 0;
}

Result:

Decay

This should be sufficient for you.

Zeta
  • 103,620
  • 13
  • 194
  • 236
  • 3
    Thank you for taking the time to explain this clearly...will test it out this evening with my data. – Paul Hinett Jul 25 '12 at 17:34
  • @Zeta How did you come up with exp(-8*x*x)? I need to apply this to a similar problem involving `created_at` and `updated_at` timestamps where I sort by `updated_at` but I need it to drop off after 6 hours of difference from `created_at` and I'm not sure how to tweak the formula for that. – paulkon Nov 28 '13 at 01:39
  • 4
    @paulkon that might be a bit late to answer... look at the first graph (the red one) in Zeta's answer: this is the graph for exp(-8*x*x), showing the dropoff applied to baseScore once the track is more than one week old. To get your dropoff after 6 hours, you'd do something like `timeDiff = (now - track.created_at).toHours` and then: `if timeDiff > 6 ; x = timeDiff - 6; baseScore *= exp(-8*x*x)`. Tweak the 8 in the exponential function: the higher the value, the steeper the dropoff :) With -8 : http://fooplot.com/plot/h0nfqukrj8 With -50 : http://fooplot.com/plot/e57bc1osnv – Olivier Lance Jan 18 '14 at 11:53
  • @OlivierLance Thanks, the graphs made it easier to visualize. I have a feeling I'll be using that site more :) – paulkon Jan 19 '14 at 00:57
  • 1
    Very nice and helpful answer. Thanks @Zeta! – Vignesh Prajapati Nov 17 '15 at 10:38
  • @zeta Can you share how did you generate the graph? – Piyush Apr 26 '17 at 13:45
  • @Piyush almost 5 years ago? I _guess_ with `gnuplot`. Something like `plot 'fakeRecord.dat' t "Without decay", 'fakeRecordDecay.dat' t "With decay"`. But I don't have `gnuplot` at hand at the moment. – Zeta Apr 26 '17 at 15:56
  • @zeta Thanks :) I will check it. BTW Now we have many online tools to generate graphs. – Piyush Apr 26 '17 at 16:19
  • 1
    @Piyush Used `gnuplot` definitely, `plot … w l` and appropriate labels. – Zeta Feb 02 '18 at 07:05