48

The ruby folks have Ferret. Someone know of any similar initiative for Python? We're using PyLucene at current, but I'd like to investigate moving to pure Python searching.

icedwater
  • 4,701
  • 3
  • 35
  • 50
PEZ
  • 16,821
  • 7
  • 45
  • 66
  • 1
    Probably not an answer to the question, but Elasticsearch implements a simple web interface on top of Lucene, and PyES is a python wrapper over Elasticsearch. I have used pyES comfortably, but some advanced features present in Lucene are still missing from Elasticsearch. – amit kumar Sep 06 '11 at 06:32
  • By the way, the old Ferret URL redirects now to http://www.chandanweb.com/solutions/web-applications.html - I've replaced the URL with the new github page https://github.com/dbalmain/ferret :) – icedwater Sep 05 '13 at 04:12
  • For accessing Lucene indices I found (and am trying out) `plush`: https://pypi.python.org/pypi/plush/0.3.0 – icedwater Sep 05 '13 at 04:33
  • any reason for going for pure python? – avi Feb 06 '14 at 05:47

8 Answers8

44

Whoosh is a new project which is similar to lucene, but is pure python.

A. Coady
  • 54,452
  • 8
  • 34
  • 40
6

The only one pure-python (not involving even C extension) search solution I know of is Nucular. It's slow (much slower than PyLucene) and unstable yet.

We moved from PyLucene-based home baked search and indexing to Solr but YMMV.

zgoda
  • 12,775
  • 4
  • 37
  • 46
4

I recently found pyndexter. It provides abstract interface to various different backend full-text search engines/indexers. And it ships with a default pure-python implementation.

These things can be disastrously slow though in Python.

Ali Afshar
  • 40,967
  • 12
  • 95
  • 109
  • I came here looking for something to access Lucene indices in python, I'm not too concerned about speed at this point. I just don't want to be tied to Java. So thanks for the pynter. – icedwater Sep 05 '13 at 04:26
  • 2
    Last release of pyndexter was 2007 and the link provided here is dead, unfortunately. – webtweakers Nov 15 '16 at 13:39
3

For some applications pure Python is overrated. Take a look at Xapian.

Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
  • 1
    Thanks for the Xapian mention. Not what I need right now, but I'll sure keep it in mind for later. – PEZ Jan 13 '09 at 22:55
2

For non-pure Python, Sphinx Search with Python API works the fastest. From the benchmarks from multiple blogs, Sphinx Search is way faster than Lucene, uses way less memory and it is in C.

I am developing a multi-document search engine based on it, using python and web2py as framework.

icedwater
  • 4,701
  • 3
  • 35
  • 50
Phyo Arkar Lwin
  • 6,673
  • 12
  • 41
  • 55
2

lupy was a lucene port to pure python.The lupy people suggest that you use PyLucene. Sorry. Maybe you can use the Java sources in combination with Jython.

Yuval F
  • 20,565
  • 5
  • 44
  • 69
  • It's interesting that Ferret seems to be very appreciated and used while Lupy was abandoned. – PEZ Jan 13 '09 at 09:22
  • Well, PyLucene seems to cater to a similar community. Also, some people are even ready to do their full-text searches in Java because of Lucene ;-) – Yuval F Jan 13 '09 at 09:42
2

+1 to the Xapian and Pyndexter answers.

Ferret is actually written in C with Ruby bindings on top. A pure Ruby search engine would be even slower than a pure Python one. I would love to see "someone else" write a Cython/Pyrex layer for Python interface to Ferret, but won't do it myself because why bother when there are Python bindings for Xapian.

Van Gale
  • 43,536
  • 9
  • 71
  • 81
  • 1
    Thanks. I used the term "pure" in a dirty way. =) If I can install it with easy_setup of the like I'm happy. – PEZ Feb 07 '09 at 11:38
1

After weeks of searching for this, I found a nice Python solution: repoze.catalog. It's not strictly Python-only because it uses ZODB for storage, but it seems a better dependency to me than something like SOLR.

Ali Afshar
  • 40,967
  • 12
  • 95
  • 109