2

I'm using Apache Solr 4.7.2.

I need to implement the following behavior: user provides a list of IDs and Solr returns documents paginated and ordered by the same order the user informed the IDs.

I came across the boost terms approach. So if user provides the IDs "2875141 2873071 2875198 108142 2918841 2870688 107920 2870637 2870636 2870635 2918792 107721 2875078 2875166 2875151 2918829 2918808", my Solr query will be:

studentId:(2875141^16 2873071^15 2875198^14 108142^13 2918841^12 2870688^11 107920^10 2870637^9 2870636^8 2870635^7 2918792^6 107721^5 2875078^4 2875166^3 2875151^2 2918829^2 2918808^1) 

But this approach is not always working. For this example specifically, we can see at this explain query, that the highest score isn't for the ^16.

If I use big boost values such as 1, 10, 100, 1000, 10000 and so on, adding one 0 at the end, as suggested in this cookbook, the ordering works fine. But that will be an issue if user searches for 200 items for instance, the query will be too long causing communication issues.

Is there any other approach I could achieve this? If not, could I use like multiplication or exponencial operations in order to get big boost factors with less characters?

Thanks

qxlab
  • 1,506
  • 4
  • 20
  • 48
  • Are you paginating through the result or retrieving all documents that match? If the latter, sorting could be done in your application code instead. Or you could use a custom similarity class that returns 1.0f as the score regardless of hits, then use a boost with `1..n` instead of having to work around the score issue.. – MatsLindh Jul 15 '16 at 21:35
  • @MatsLindh thanks for commenting. I just edited the question, I am paginating the results indeed. I didn't quite understand your second suggestion... can it be applied for the paginating scenario? Thanks – qxlab Jul 15 '16 at 22:08
  • Did you try the first approach mentioned in your question within the bq?? – AR1 Jul 16 '16 at 17:08
  • @AR1 I just tried, using edismax, the **q as** `productID:(2875141 2873071 2875198 108142 2918841 2870688 107920 2870637 2870636 2870635 2918792 107721 2875078 2875166 2875151^2 2918829 2918808)` and **bq as** `productID:(2875141^16 2873071^15 2875198^14 108142^13 2918841^12 2870688^11 107920^10 2870637^9 2870636^8 2870635^7 2918792^6 107721^5 2875078^4 2875166^3 2875151^2 2918829^2 2918808^1)` but the **result was the same**.. thanks – qxlab Jul 16 '16 at 17:33
  • Did you try with a custom similarity class [that always return 1.0f in score](https://stackoverflow.com/questions/20428709/solr-custom-similarity)? In recent versions of Solr you can provide the similarity class per field, which means that you can have a custom field which would allow boosting by position in your query string. – MatsLindh Jul 16 '16 at 21:23
  • @MatsLindh providing similarity class per field would make an entire query to use that similarity even if there are other fields in the query? If not, how can I make a specific query to use that similarity class? Thanks! – qxlab Oct 04 '16 at 01:20
  • Does this answer your question? [Is it possible in solr to specify an ordering of documents](https://stackoverflow.com/questions/19813548/is-it-possible-in-solr-to-specify-an-ordering-of-documents) – Ahmad Abdelghany Oct 17 '22 at 12:01

2 Answers2

0

A viable option is to write a custom Solr function that takes in input the name of the field and boosts the content by position. For instance:

bq=myCustomBoostFunction(fieldName, boostFactor)

Where boostFactor could be optional or you could simply omit that in your solution. Any boosting would be part of the java code within your function. This kind of solution would have Pros and Cons:

Pros

  • the same function could be reused for other fields without any additional implementation;

  • the boost factor would allow you to tune up your solution;

  • any calculation would be done within Java code without affecting the query length.

Cons

  • your function implementation could be slow if you read the content of the field instead of using payloads.
AR1
  • 4,507
  • 4
  • 26
  • 42
  • AR1, would you know which Solr class should I extend in order to implement a new boost function? – qxlab Oct 04 '16 at 01:05
0

The boost factors fail because of using the default similarity, which does TF, IDF calculations. You could use a NoTfIDF similarity, which will just give a score of 1 for one word match. Then this should work.

Also, instead of a regular query, you can use the /get handler, which can get a list of ids and return the fls requested in the same order. But you have to handle pagination yourself. i.e. send what ever ids that would fall in to the page requested.

melchi
  • 627
  • 6
  • 10