3

I went through their github files as well as the official site, I can't find the named entity tagging training corpus they used in splotlight.

How Can I found the dataset instead of a trained model?

Tilney
  • 318
  • 2
  • 17

1 Answers1

0

see This link https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki/Web-service

In here, method for setting up dbpedia lookup offline is explained. Also they have given 4 tar files which are

  • redirects_en.nt
  • short_abstracts_en.nt
  • instance_types_en.nt
  • article_categories_en.nt

these are supposed to be training data for it.

Gunjan
  • 2,775
  • 27
  • 30
  • The link you refer to provides guide on using dbpedia-spotlight services, I didn't find any information on how to generate NER training corpus. It's true we can use the 4 tar files to generate ourselves, but the whole parsing process is time consuming and more importantly, it's not part of our core logic. So I was looking forward to a tool to generate ner training data as I posted before(http://github.com/dbpedia-spotlight/pignlproc) – Tilney Dec 15 '14 at 03:24