3

In my web app (built using Spring/Hibernate v4 + JPA v2.1 + PostgreSql 9.3), I have to provide the below functionalities:

  1. Text search from multiple database tables having few 10000's of rows
  2. Text search from files - doc, xls, pdf, htm (few 10000's)
  3. Spatial search/indexing: finding entities within a radius of x KM from a point

I found multiple options available but not able to weigh the pros/cons:

  • Spring Data Solr - Possible for all 3 above but not real time indexing
  • Hibernate Search - Uses Lucene only but not sure whether 2 is supported as could not find anything on that in its document, but 1 & 3 works. Though, indexes are updated automatically.
  • Hibernate Spatial - Don't know whether the spatial support in Hibernate Search is same as this
  • Solr & Hibernate Search combined to enjoy best features provided by both but could not find more info on this path

Which option could be used to support all my requirements? If someone can point out the pros/cons of each, that would be a big help in decision making.

Since data would be added very frequently in my app, real time indexing would be a big plus.

Community
  • 1
  • 1
AAgg
  • 494
  • 1
  • 3
  • 19
  • Not a single comment, up/down vote after 1 day, I must have asked a silly question :( – AAgg Oct 05 '14 at 04:17

1 Answers1

3

Disclaimer: I am one of the developers of Hibernate Search, but also contribute to Lucene and Solr as we rely on it and love it

Hibernate Search includes the same technology as Solr, the main difference is that Hibernate Search will embed it in your application while Solr is usually run as a standalone service.

The benefit of a standalone service like Solr is that you can use it as integration point to other non-Java services, the downside is that you'll have to manage and maintain a new service. Solr will also need to be integrated with your application, while the role of Hibernate Search is to integrate it (and embed Apache Lucene, the technology which Solr is built on) and apply changes by listening to Hibernate events automatically.

It's able to full-fill all three requirements, including filtering in a radius and real-time indexing; the indexing of documents will need to happen via its integration with Apache Tika.

Hibernate Spatial is usually applied when you have more complex geometries than a simple distance/radius criteria, and is currently not integrated with the full-text indexing so I would suggest to use the Spatial functionality of Hibernate Search (which is unrelated to Hibernate Spatial).

The main drawback for Hibernate Search is obvious: it requires your application to use Hibernate, as its main functionality is to listen to update events generated by the update transactions. It provides the same underlying technology as Solr so there isn't much to debate about "better", other than the significant architectural difference between having an embedded technology vs a separate REST based server. Each one has benefits and drawbacks, but that highly depends on other factors of your architecture rather than the plain functionalities provided. We plan in a future version to support sending events to the Solr server running standalone so that you'll eventually have the choice of how you want to setup your architecture, without needing to change how you model your domain and your application logic.

Sanne
  • 6,027
  • 19
  • 34
  • Thanks for sharing this info Sanne. If I may ask, can you also tell few limitations of Hibernate Search vs Solr which as a Hibernate Search developer you are aware of. This would help others as well to compare these two technologies in a better way. – AAgg Oct 06 '14 at 13:39
  • Good point, added that. HTH – Sanne Oct 06 '14 at 20:03
  • This is a perfect answer now for all my current queries, thanks a ton. – AAgg Oct 07 '14 at 03:04