1

My Django application needs to be able to search large volumes of chat logs that are stored on another Postgres DB i.e. a different one that my Django's DB. Initially users on the site would be using simple full-text search the logs but later we intend to parse these logs using NLP.

What would be a better indexing option in this case — Sphinx or Solr?

I'm looking for something that is FOSS, scales well, supports NLP and has good Python/Django bindings unless any one of you have a better way/tool to accomplish this.

Sorry if I've gotten anything wrong above. I'm new to the concept of implementing anything like this and am trying to best grasp these as quickly as possible.

Léo Léopold Hertz 준영
  • 134,464
  • 179
  • 445
  • 697
Mridang Agarwalla
  • 43,201
  • 71
  • 221
  • 382

2 Answers2

1

Also check out Haystack

Gourneau
  • 12,660
  • 8
  • 42
  • 42
0

It won't be completely painless to implement, but I think if you want to do full text search the clear answer is Solr/Lucene as far as open source implementations go. Caveat: I don't use Solr with Python, and I've never used Sphinx.

The pipeline would be something like read the logs from the db, index them, store the indexes on whatever server, and then search.

Adding extra/custom NLP stuff into the Lucene indexer is pretty easy.

This thread comparing Lucene and ElasticSearch may be worth looking at.

Community
  • 1
  • 1
nflacco
  • 4,972
  • 8
  • 45
  • 78