5

I have a Solr index with document fields something like:

id, body_text, date, num_upvotes, num_downvotes

In my application, a document is created with some integer id and some body_text (500 chars max). The date is set to the time of input, and num_upvotes and num_downvotes begin at 0.

My application gives users the ability to upvote and downvote the content mentioned above, and the reason I want to keep track of this in Solr instead of just the DB is that I want to be able to consider the number of upvotes and downvotes into my search.

This is a problem because you can't simply update a solr document (i.e. increment number of up_votes) and you must replace the entire document, which is probably fairly inefficient considering it would require hitting my DB to grab all the relevant data again.

I realize the solution may require a different layout of data, or possibly multiple indexes (although I don't know if you can query/score across solr cores).

Is anyone able to offer any recommendations on how to tackle this?

DJSunny
  • 1,970
  • 3
  • 19
  • 27
  • I asked something like this: http://stackoverflow.com/questions/8411860/can-solr-boost-results-on-number-of-social-likes – Jesvin Jose Jan 16 '12 at 10:50

4 Answers4

4

A solution that I use in a similar problem is to update that information in database and do SOLR Updates/Inserts every ten minutes using the documents that were modified since the last update.

Also every night, when I don't have much traffic I do index optimize. After each import I set up some warm-up queries in SOLR config.

In my SOLR index I have around 1.5 milion documents,each document has 24 fields, and around 2000 characters in the entire document. I update the index every 10 minutes around 500 documents ( without optimizing the index ), and I do around 50 warmup queries comprised of most common facets, most used filter queries and free text search.

I don't get negative impact on performance. ( at least it is not visible ) - my queries run average in 0.1 seconds. ( before doing update at every 10 minutes average queries were 0.09 seconds)

LATER EDIT:

I didn't encounter any problems during this updates. I allways take the documents from database and insert them with a Unique key to SOLR. If the document exist in SOLR it is replaced ( this is what I mean by update).

It never takes more than 3 minutes to update SOLR. Actually I am doing 10 minutes break after each update. So I start the update of the index, I wait for it to finish, and then I wait another 10 minutes to start again.

I did not look on the performance over the night, but for me it is not relevant, as I want to have fresh information of data during the users visits peaks.

Dorin
  • 2,482
  • 2
  • 22
  • 38
  • Thanks for the info. I've actually thought of doing that interval update approach, are you aware if there are issues with conflicts while those 10 minute updates are happening? (i.e. if those documents are "out" of the index briefly do queries being executed "miss" them?) - Also curious, how long does your `index optimize` take when you run it? Did you find that running this nightly improved performance non-trivially? – DJSunny Nov 18 '11 at 21:46
  • I have been looking for an answer like this for a long time – Jesvin Jose Jan 16 '12 at 10:52
  • It's conflicting, you said "before doing update at every 10 minutes average queries were 0.09 seconds" and on realtime update it's taking less i.e 0.1 second, how it's possible as SOLR reindexes the whole index after every update to any document(remove and create again) . – Kumar-Sandeep May 13 '21 at 06:35
  • Also your 10 min update is not flat for every one as it'll impact your search queries in every 10-15 minutes due to reindexing process of solar and if you are a heavy load business it'll impact your business, better if you separate frequent updatable part in some cache and do a merge on this cache and solar before returning the result of the search query. – Kumar-Sandeep May 13 '21 at 06:35
2

The Join feature would help you here. Then you could store the up/down votes in a separate document.

The bad news is that you need to wait until Solr 4 unless you're comfortable running with a trunk build.

brian519
  • 318
  • 2
  • 10
  • Thanks for the tip. Any idea how stable the trunk of Solr 4 is? (or any idea when the Solr 4 release would be) – DJSunny Nov 16 '11 at 16:12
  • I was wondering the same things a few days ago. From what I remember of my google searches, there are some people using Solr 4 in production. It's really tough to pin down a release date for an open source project, but I remember seeing someone guess 8 months out. You can see what issues are still open here: https://issues.apache.org/jira/browse/SOLR/fixforversion/12314992#atl_token=A5KQ-2QAV-T4JA-FDED%7C998a6b54a3f89920a488573221c1192d2e78926c%7Clout&selectedTab=com.atlassian.jira.plugin.system.project%3Aversion-issues-panel – brian519 Nov 16 '11 at 17:12
1

If you are only going to be updating the up/down votes. Instead of going back to the database, just use the appropriate Solr Client for your application and pull the document from the index, set the up/down values as needed and then reinsert the document back into the index.

Paige Cook
  • 22,415
  • 3
  • 57
  • 68
  • Issue is in solr we can set fields to `stored=false` to prevent bloating especially during scale. So if I'm not storing the body_text, I won't be able to pull it from Solr and reinsert. – DJSunny Nov 16 '11 at 15:47
0

There is no solution to your problem within SOLR. You have a database problem and you are trying to solve it with a search engine.

The best way to deal with this is to keep a redis database that records the document id from SOLR and the up/down vote counts. Then your app can merge the data from both sources before displaying.

Michael Dillon
  • 31,973
  • 6
  • 70
  • 106