2

I have a project that is being written on top of the Grape API framework in ruby. (https://github.com/intridea/grape)

The problem I'm having is that Thinking-Sphinx vs. Sunspot (Gems used to interface with each search index) have worlds different benchmarks. View the Benchmark Here

We're trying to develop something that is quick and easy to deploy (Solr needs Java).

The issues we see right now is mainly that Solr is slower through Sunspot gem and Sphinx is faster through Thinking-Sphinx because Solr is HTTP REST calls where Sphinx is sockets.

Anyone have any experience in either and can explain pitfalls / bonuses?

Note: Needs to be deployable to Rails AND non-rails apps (Hence Sunspot).

Thanks!

Glen Solsberry
  • 11,960
  • 15
  • 69
  • 94
Robert Ross
  • 1,895
  • 2
  • 23
  • 32
  • Should a dependancy to Java be an issue for software like this? The advantage fo REST calls is that they are easier to implement in most languages. So you may be more flexible using Solr. I've tried both, but only in a couple of small expriments. I can't really advise to use one or the other... – GolezTrol Aug 31 '11 at 23:08
  • Sphinx is faster than Solr because it has less analysis, it doesn't support updating the index and loads information all at once in a single run (or using the always prone to errors delta index merges). If you don't need live updates to your index, don't have complex requirements for indexing data and don't need any of the fancy plugins available for Solr, then Sphinx might be an option. And if you think Solr/Sunspot integration is complex, here's a proof it isn't - http://techbot.me/2011/01/full-text-search-in-in-rails-with-sunspot-and-solr/ – Maurício Linhares Sep 01 '11 at 03:15
  • 2
    And one thing that puzzles me in the Ruby community is this **Solr needs Java**, what's wrong with Java, really? You can easily install a JVM anywhere (in much more places than you can install Ruby, in fact). Not sure how "harder" it can be to handle a daemon process proven on 6 years of use in the wild running inside the **best** virtual machine publicly available in the marked today. – Maurício Linhares Sep 01 '11 at 03:19
  • @Maurício: you should really make an answer out of those great comments! – Mauricio Scheffer Sep 02 '11 at 02:34
  • 1
    It's worth noting someone else asked a very similar question the other day: http://stackoverflow.com/questions/7227176/performance-difference-between-sunspot-and-thinking-sphinx – pat Sep 02 '11 at 22:39
  • As for the Java argument - personally, I see it as another thing to worry about, hence why I don't rush to install it. If you've come from Java, this is likely not an issue - Java's not a bad thing, after all. – pat Sep 02 '11 at 22:41

2 Answers2

3

Sphinx is easier to setup and get working and provides most of the flexibility you may want.

Solr is more fully featured and scales up to larger data better, but may be harder to configure and get working the way you'd like.

We've been using Sphinx for years at PatientsLikeMe and I wish we had originally selected Solr. I regret not having more complex weighting and sorting options. On the other hand, Sphinx was a lot easier to get setup initially.

EDIT: I've had even better luck with ElasticSearch, which is both easy to setup, scales well, and has all of the features of Solr. I would strongly recommend ElasticSearch over either Sphinx or Solr for anyone.

Winfield
  • 18,985
  • 3
  • 52
  • 65
1

As noted in the comments of that post, that benchmark is next to useless. Not only does it not mention the parameters of the index and the queries used in the benchmark, but it only seems to benchmark transfer speed, not actual searching speed.

Please note that I'm not saying that Solr is faster / better than Sphinx, I'm just saying that I'd never use that post and that benchmark to make a decision about a search engine for my applications.

By the way, it doesn't really matter much, but Solr's HTTP interface is not REST at all.

Also, if you really make your benchmarks correctly and determine that you absolutely need top performance and the bottleneck is XML serialization (in the real world, this almost never happens), you can take a look at this javabin implementation in Ruby.

Mauricio Scheffer
  • 98,863
  • 23
  • 192
  • 275