1

In some time ago i have asked about cosine similarity implementation and as recommended i have chosen Apache Lucene Engine for it ( Cosines similarity on large data sets ) and still dealing with performance issues because results are almost the same comparing with previous implementation ( see link ).

Since i'm quite new to JAVA sorry if there will rather problems with my JAVA not in Lucene self.

DB which contains about 250k documents is indexed to Lucene using MMapDirectory. Then i'm iterating over all DB and comparing these documents to single document which i have prepared in advance before all iteration.

I noticed that most time are spent on indexReader.docFreq method and i have no idea how to optimize since i don't have found any tools to monitor and detect what is causing this slowness.

I thought that it could be disk I/O since i chosen to index in file system not RAM but in Windows Resource Manager i couldn't see any data that would confirm my guess. How i can find out all these bottlenecks which causes performance issues?

Also maybe there's already some tips / solutions which i have missed out when searched in google.

Update:

For implementation i followed provided example this post.

Community
  • 1
  • 1
  • The whole point of using Lucene is that it has implemented cosine similarity already. What does your current implementation look like? Most decent Java Profilers should be able to give you low level information about exactly where your code is spending time (and you probably want to compile Lucene yourself in that case). You can also implement a custom similarity class for Lucene if you want to - search the docs for Similarity. – MatsLindh Aug 06 '15 at 08:13
  • @MatsLindh, thanks for answer. I have updated my question with implementation source. You have mentioned that Lucene has implemented cosines similarity already. Should it be faster than my current implementation? If i understand you correctly my implementation is "custom", right? For profiling i have found [https://www.yourkit.com](Yourkit) i hope it helps? – deividaspetraitis Aug 06 '15 at 10:29

0 Answers0