3

I'm try to implement a logic in APACHE SOLR so that documents older than 2 years should get penalty based on the difference in number of days or months.

I am using this boost function, which I got after googling a lot.

 recip(ms(NOW,publicationDate),3.16e-11,1,1) // Currently it is set to use 1 year

Can any please confirm if this penalties old documents or what ?

Thanks

sehe
  • 374,641
  • 47
  • 450
  • 633
Farhan Tahir
  • 2,096
  • 1
  • 14
  • 27

1 Answers1

5

A reciprocal function with recip(x,m,a,b) implementing a/(m*x+b). m,a,b are constants, x is any numeric field or arbitrarily complex function.

enter image description here

In case of your parameters, your function will look like this:

f(x) = 1 /(3.16e-11*x + 1)

Function ms returns milliseconds of difference between it's arguments.

Dates are relative to the Unix or POSIX time epoch, midnight, January 1, 1970 UTC.

Imagine, your publication date is September 1st 2015, ms will get us NOW = 1507725936061 and publication date is 1441065600000 and the whole result will be around 0.3 which will be the score for this document.

For publication date of yesterday, we will get score of 0.99, which leads to the idea, so, this formula will apply penalty to every document not only to ones which are 2 years old. For example, for the same day 1 year ago the score will be 0.5

I could think potentially about sorting by this function (starting from Solr 6)

if(gt(ms(mydatefield,NOW-2YEARS),0),1,recip(ms(NOW,publicationDate),3.16e-11,1,1))

I didn't test it (not sure about NOW-2YEARS part), but basically, i'm doing this:

if mydatefield - NOW-2YEARS greater 
    than 0 => score will be 1.0 
    else   => I'm calculating reciprocal function

One last remark: there are 3.16e10 milliseconds in a year, so one can scale dates to fractions of a year with the inverse, or 3.16e-11, so for 2 years, you may select something different.

Mysterion
  • 9,050
  • 3
  • 30
  • 52
  • Thanks for explanation. Can I tune ti so that it only applies penalty to documents which are two years old. – Farhan Tahir Oct 11 '17 at 12:56
  • Hi,can you please also tell the value that I should use for 2 years. instead of 3.16e-11 Thanks – Farhan Tahir Oct 11 '17 at 13:42
  • 3.16e-11 should be fine – Mysterion Oct 11 '17 at 13:50
  • by the way, what exactly reciprocal does in simple terms, does it boost documents or penalties ? – Farhan Tahir Oct 11 '17 at 13:56
  • 1
    in simple terms it usually penalites, take a look at the graph, i've posted. function is decreasing – Mysterion Oct 11 '17 at 13:59
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/156476/discussion-between-farhan-tahir-and-mysterion). – Farhan Tahir Oct 11 '17 at 14:06
  • I'm using this formula if(gt(ms(created_at,NOW-2YEARS),0),3,recip(ms(NOW,created_at),3.16e-11,1,1)) But I have to use 3 instead of 1, as using 1 was returning documents of 2013 and 2014 as well. which I don't want. What you think why it showed those documents. ? – Farhan Tahir Oct 11 '17 at 14:08