0

I wanted to arrange a system where a new item gets indexed in Solr as soon as it is created in db system, to avoid a few minutes delay of the time based delta polling. So I tweaked the delta import a little and made it work based on a query parameter. In my c# code, when a new item is saved, I construct a deltaimport url and pass the newsid to be indexed and invoke it by httpwebrequest. It then uses the delta query to fetches the details from the db and index it.

http://localhost:89983/solr/mycore/dataimport?command=deltaimport&clean=false&newsid=1234

This works as expected. But now, the issue comes when the flow of the news gets higher, say 5 news at a time. The url is hit by the code for each item in a loop, but it is so fast that it is observed that one (first) or sometimes 2 items gets indexed only. Rest are missed.

So, I believe that Solr can't handle multiple hits for delta in nearly same time. How can I overcome this situation?

Arjun_TECH
  • 343
  • 1
  • 4
  • 16
  • If you're already calling out to Solr when you get a newsitem - why aren't you just submitting the new document directly? That will be _far_ more performant. – MatsLindh Sep 07 '18 at 18:13
  • could you also check if the all 5 queries are reaching to solr? – Aman Tandon Sep 10 '18 at 04:59
  • Well yes, for each request I get 200 OK response. Is that enough to confirm ? – Arjun_TECH Sep 11 '18 at 12:33
  • @MatsLindh yes tha's also a way, but as I am fitting this solution in an existing system, I need to make changes at multiple places to fetch required columns to index in the same call where I am getting the newsid. Also, I can use the newsid to get news data in another db call and construct a Solr document - this will cost me another db call and a major hurdle is, we are stripping HTML from news body before indexing - which I will need to implement in c# code (I didnt found any lib for it). As Solr itself provides html striping module, I chose Solr only to do the whole stuff. – Arjun_TECH Sep 11 '18 at 12:39
  • What do the solr logs say? Do you see all of the update requests being recorded? I do agree with @MatsLindh. Just send it directly to solr. Calling a delta import manually just adds more complexity and degrades performance – Binoy Dalal Sep 12 '18 at 00:13
  • Hi @BinoyDalal, as I mentioned, sending the doc directly to Solr is not feasible in my case. Reason being, the content of whole document is not required in Solr and also it is raw content, while in Solr we index processed content only from the db. And just a thought for performance - as the full import works so efficiently that it indexes roughly 15K rows in a minute (around 200 news/sec) - would it be wise to assume that delta import for given news id is complex or expensive process ? – Arjun_TECH Sep 14 '18 at 10:59

0 Answers0