11

Since big web applications came into existence, searching for data (and doing it lightning fast and accurate) has been one of the most important problems in web applications. For a while, I've worked using Lucene.NET, which is a C# port of the Lucene project.

I also work using PHP using Zend Framework's Lucene API, which brings me to my question. Most times for providing good indexing we need to perform some NLP tools like tokenizing, lemmatizing, and many more, the question is:

Do you know of any good NLP programming framework/toolset using PHP?

PS: I'm very aware of the Zend API for Lucene, but indexing data properly is not just storing and relying in Lucene, you need to perform some extra tasks, like those above.

David Conde
  • 4,631
  • 2
  • 35
  • 48

3 Answers3

7

I would suggest that you look at Solr, which is a best practice implementation of Lucene. Solr uses a REST based API that also has a very good PHP client. This will allow you to leverage the power of Lucene without needing to perform any of the low level programming to get the NLP power that you want. Also, you would probably want to grab the trunk version of Solr as the NLP development is very active right now and new capabilities are being added every day.

Paige Cook
  • 22,415
  • 3
  • 57
  • 68
4

Zend has a full port of lucene to PHP. See docs here.

Xodarap
  • 11,581
  • 11
  • 56
  • 94
  • Yes, Im aware of it, and I use it, but my NLP tools where about finding any tokenizers, name parsers or something like it. I'll edit the question anyways, because perhaps is not clear enough. – David Conde Dec 17 '10 at 05:21
  • @David: I added more to my answer; Lucene can indeed tokenize and lemmatize. – Xodarap Dec 17 '10 at 17:29
  • Im also aware of the abilities of Lucene, but you are signaling to the Java original project and I think that the Zend port does not contain them, so I still the same. Thks anyways – David Conde Dec 18 '10 at 05:21
  • 1
    @David: Zend has ported a lot. It obviously can [tokenize](http://framework.zend.com/svn/framework/standard/trunk/library/Zend/Search/Lucene/Analysis/) (or else it would be useless), and there are also [stemmers](http://codefury.net/2008/06/a-stemming-analyzer-for-zends-php-lucene/). – Xodarap Dec 20 '10 at 04:06
  • @Xodarap : Some links are showing 404 page not found. – Lijo Abraham Apr 26 '16 at 13:40
0

Seems like you are looking for the same stuff i googled a few months back :D... I'm running a php/zend based project with Solr (via php-solr-client lib), and so far I havent found anything in php for advanced NLP. For basic stuff, as everyone mentions, you can get away with Solr (stemming, tag clouds / phrase tag clouds, tokenizing, etc), and there are a few basic but useful text processing php libraries out there (nothing fancy really, better rely on Solr itself)... but if you are looking for more algorithmic/semantic/sentiment NLP analysis I suggest you move a bit from PHP and get into Java, as there are more libraries that can help you in this area(such as OpenNLP). In case te adavanced stuff is what you are looking for, you probably might want to take a look at Mahout:

http://www.lucidimagination.com/blog/2010/03/16/integrating-apache-mahout-with-apache-lucene-and-solr-part-i-of-3/

Osvaldo Mercado
  • 960
  • 3
  • 13
  • 24