I am trying to find named entities in a given text. For that, I have tried using DBPedia spotlight service.
I am able to get a response out of that. However, the DBPedia dataset is limited, so I tried replacing their spotter.dict file with my own dictionary. My dictionary contains entities per line:
Sachin Tendulkar###PERSON
Barack Obama ###PERSON
.... etc
Then I parse this file and build an
ExactDictionaryChunker
object.Now I am able to get the entities and their types (after modification of dbpedia code).
My Question is: DBPedia spotlight is using Lucene Index files. I really don't understand for what purpose they are using these files?
Can we do it without using Index files? Whats the significance of the index files?