5

I’m looking for a library that does text analysis and extract entities.

The type/classification of an entity is not critical, it’s the identification of something that’s worthwhile that is critical. The entities universe in this case is infinite, it’s not bounded by fixed dictionary.

It seems that there are a couple of web services that do that (NERD let you compare the results of these web services: http://nerd.eurecom.fr/documentation which is pretty useful), but I’m looking for a local library and not a remotely hosted service. I’d prefer Java or .NET but if it’s a good library I’ll learn whatever language that it’s written in.

There are few older threads on similar topic and I was hoping to find new development in this area, and/or libraries built on top of lower level NLP libraries:

Does anyone know about a good library that does a decent job?

Community
  • 1
  • 1
hi1869695
  • 51
  • 3

3 Answers3

3

I've researched, but never used, the following hosted entity identification services:

OpenCalais

AlchemyAPI

cmbaron
  • 248
  • 1
  • 9
  • Updated the question to highlight that I'm looking for a local library and not a remotely hosted service. – hi1869695 Dec 02 '12 at 05:39
1

If you are comfortable with Perl, there are several language taggers / parts-of-speech taggers available (Lingua::TreeTagger and Lingua::BrillTagger come to mind (via Google)).

Mark Leighton Fisher
  • 5,609
  • 2
  • 18
  • 29
0

You could use NLTK and Python. See this question for an example of using NLTK to do NER.

Community
  • 1
  • 1
Joshua Barron
  • 1,532
  • 2
  • 26
  • 42