14

I'm using haystack with whoosh as backend for a Django app.

Is there any way to view the content (in a easy to read format) of the indexes generated by whoosh? I'd like to see what data was indexed and how so I can better understand how it works.

Mark Amery
  • 143,130
  • 81
  • 406
  • 459
daniels
  • 18,416
  • 31
  • 103
  • 173

2 Answers2

17

You can do this pretty easily from python's interactive console:

>>> from whoosh.index import open_dir
>>> ix = open_dir('whoosh_index')
>>> ix.schema
<<< <Schema: ['author', 'author_exact', 'content', 'django_ct', 'django_id', 'id', 'lexer', 'lexer_exact', 'published', 'published_exact']>

You can perform search queries directly on your index and do all sorts of fun stuff. To get every document I could do this:

>>> from whoosh.query import Every
>>> results = ix.searcher().search(Every('content'))

If you wanted to print it all out (for viewing or whatnot), you could do so pretty easily using a python script.

for result in results:
    print "Rank: %s Id: %s Author: %s" % (result.rank, result['id'], result['author'])
    print "Content:"
    print result['content']

You could also return the documents directly from whoosh in a django view (for pretty formatting using django's template system perhaps): Refer to the whoosh documentation for more info: http://packages.python.org/Whoosh/index.html.

Frerich Raabe
  • 90,689
  • 19
  • 115
  • 207
Zach Kelling
  • 52,505
  • 13
  • 109
  • 108
7
from whoosh.index import open_dir
ix = open_dir('whoosh_index')
ix.searcher().documents()  # will show all documents in the index.
Collin Anderson
  • 14,787
  • 6
  • 68
  • 57