What is the use of lucene index files in DBPedia-Spotlight..?

Question

I am trying to find named entities in a given text. For that, I have tried using DBPedia spotlight service.

I am able to get a response out of that. However, the DBPedia dataset is limited, so I tried replacing their spotter.dict file with my own dictionary. My dictionary contains entities per line:

Sachin Tendulkar###PERSON

Barack Obama ###PERSON

.... etc
Then I parse this file and build an ExactDictionaryChunker object.
Now I am able to get the entities and their types (after modification of dbpedia code).

My Question is: DBPedia spotlight is using Lucene Index files. I really don't understand for what purpose they are using these files?

Can we do it without using Index files? Whats the significance of the index files?

Looks like there is some explanation of how Lucene is used in their [Github wiki](https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki/Lucene---Architecture) — femtoRgon, Feb 21 '14 at 17:04
Thanks for your response.. But here they not at all discussed about lucene index. It's too abstract. — Sreedhar GS, Feb 25 '14 at 08:18

score 0 · Answer 1 · answered Aug 01 '15 at 19:35

Lucene was used in the earlier implementation of DBpedia Spotlight to store a model of each entity in our KB. This model is used to give us a relatedness measure between the context (extracted from your input text) and the entity. More concretely, each entity is represented by a vector {t1: score1, t2: score2, ... }. At runtime we model your input text as a vector in the same dimensions and measure the cosine between input vector and entity vectors. In your case, you would have to add a vector for Sachin Tendulkar to the space (add a document to the Lucene index) in case it is not already there. The latest implementation, though, has moved away from Lucene to an in-house in-memory context store. https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki/Internationalization-(DB-backed-core)

What is the use of lucene index files in DBPedia-Spotlight..?

1 Answers1