68

I'm looking for a good open source (with LGPL or a permissive license) indexing engine for a node.js application, something like Lucene. I'm looking for in-process indexing and search and am not interested in indexing servers like Sphinx or Solr.

I am not afraid to create bindings for a C/C++ library either so I'm open to those kind of suggestions as well.

So far I've found

  • node-clucene which doesn't seem to be actively maintained anymore (and has several open issues)
  • I could create my own binding for CLucene but it seems to be quite sparsely maintained and its current version is also quite behind the Java Lucene
  • Apache Lucy which seems to be designed for the purpose of creating bindings for dynamic languages, but so far they don't have node bindings (nor a C API) and I haven't found any docs about creating bindings. I also didn't find any benchmarks about its performance.
  • node-search which seems to be abandoned
  • jsii which seems to be still a prototype and is also abandoned
  • fullproof which is only intended to run in a web broswer
  • lunr.js which seems to only allow serializing the whole index, so isn't scalable

I could "roll my own", but I'd prefer to use an already existing solution.

EDIT: Why I'm not interested in a standalone index server: I use a fast in-process key-value store database, so it'd be quite a waste having to go out of process for querying.

Venemo
  • 18,515
  • 13
  • 84
  • 125

4 Answers4

16

Just an update to my earlier answer - since there was so much discussion I didn't want this update to get lost.

You can download it here:

SiddAjmera
  • 38,129
  • 5
  • 72
  • 110
Fergie
  • 5,933
  • 7
  • 38
  • 42
  • Short answer: no; Longer answer: Forage sorts on document relevance, and gives the owner simple, yet powerful control over how relevance in determined; Longest answer: Forage has not supported sorting on abstract fields because that has been seen as outwith the core scope of the project. However, probably at some point in the future sort functionality will be added, since there is a demand for it. – Fergie Aug 28 '13 at 11:02
  • How do you calculate document relevance? – Venemo Aug 28 '13 at 19:13
  • Woah there @user2020565! Forage is fully accessible from multiple processes :) – Fergie Jul 17 '14 at 09:23
  • Thanks @Fergie -- I just assumed that was the case based on the technologies behind it. I am going to look more into forage. – noderman Jul 18 '14 at 14:04
  • 3
    [It appears](https://github.com/fergiemcdowall) you are affiliated with norch. Please note that our [self-promotion policy](http://meta.stackexchange.com/questions/57497) requires you to disclose this information in answers like this one and that you not mention the product in a huge percentage of your posts. – josliber Dec 05 '15 at 20:58
  • 3
    @josliber I hesitate to rise to this but for the record: Norch is an open source project that I and others use our spare time on for the good of humanity, since it is non-commercial it happens to be under my GitHub username. As for "a huge percentage of posts", thats just not true, but we do of course do a (very small) amount of work with awareness, which amounts to a handful of posts on stackoverflow. – Fergie Mar 09 '16 at 14:26
14

Yes, check out the newly released Norch

Norch is based on the search-index module for node.js, which is in turn based on Google's powerful levelDB index.

EDIT: Use the search-index module for fast "in-process" search capability.

Fergie
  • 5,933
  • 7
  • 38
  • 42
  • What does `Norch` add on top of `search-index`? – Venemo Jul 08 '13 at 15:19
  • Norch makes search-index available on HTTP and adds a few other GUI things. – Fergie Jul 14 '13 at 21:11
  • That being said, `search-index` might be of use to me but it seems to be too closely tied with LevelDB :( – Venemo Jul 15 '13 at 08:45
  • Why is LevelDB 'bad'? – Fergie Jul 16 '13 at 08:05
  • It's not LevelDB that's bad, it's tight coupling that's bad. Besides, there are better alternatives to LevelDB, see http://symas.com/mdb/microbench/ :) – Venemo Jul 17 '13 at 07:50
  • search-index is based on levelUP which can be plugged into 'any' backend by switching out the levelDOWN dependency. Its about as loosely coupled to the underlying index as your going to get. See http://r.va.gg/2013/06/leveldown-alternatives.html for a quick overview – Fergie Aug 12 '13 at 09:03
  • In the meantime I took a look at `search-index` :) I like your code, but it doesn't seem to support sorting the result set. At least the API doc doesn't say how I can tell it which field I want to sort by. Can you give some details about that? – Venemo Aug 13 '13 at 07:44
  • 3
    [It appears](https://github.com/fergiemcdowall) you are affiliated with norch. Please note that our [self-promotion policy](http://meta.stackexchange.com/questions/57497) requires you to disclose this information in answers like this one and that you not mention the product in a huge percentage of your posts. – josliber Dec 05 '15 at 20:58
11

Can you explain why you're not interested in using an external index? For full text search I always revert to using PostgreSQL's full text indexing capabilities - it's very fast, indexing doesn't require a full-index-update (like Solr does), and results are returned faster than Lucene based solutions (such as Elastic Search).

But if you really want to do it in-process, you probably want to look at Lunr: http://lunrjs.com/ - it does work in Node, not just in the browser.

Edit: Here's where I got my stats on Postgres being faster than Lucene: http://fr.slideshare.net/billkarwin/full-text-search-in-postgresql - see Slide 49.

Edit: Not sure what kind of speed you're looking at for in/out of process, but our PostgreSQL database can do 100k queries per second without breaking a sweat, and it's not even on SSDs. Perhaps you're over-thinking your performance needs - after all once you need to go to multiple nodes (or using cluster to take advantage of all CPUs) you will need to dump in-process anyway.

Matt Sergeant
  • 2,762
  • 16
  • 12
  • 2
    " and results are returned faster than Lucene based solutions (such as Elastic Search)." Any benchmarks to back that up? I'm almost certain most reviews would have it the other way around. – Geert-Jan May 19 '13 at 18:07
  • 1
    I use a very fast, in-process database for its speed. Thus, having an out-of-process index would make it quite ridiculous. – Venemo May 19 '13 at 20:12
  • 1
    I looked at lunr, yes, but it currently doesn't support persisting the index without having to serialize the whole index all the time. – Venemo May 19 '13 at 20:14
  • 1
    +1 for recommending using fti especially if your data source is an rdbms. sometimes the solution nearer to hand can get you out of a tight spot. – booyaa Jul 16 '13 at 08:34
  • @booyaa My data source is not an RDBMS. – Venemo Aug 01 '13 at 18:47
2

Full Text Search Light, is a pure in JS written node module for doing full text searches. Here you can find the current git repository link: https://github.com/frankred/node-full-text-search-light

Frank Roth
  • 6,191
  • 3
  • 25
  • 33