I am trying to train a NER model in Indian with custom NE (named entity) dictionary for chunking. I refer to NLTK and Stanford NER repectively:
- NLTK
I found the nltk.chunk.named_entity.NEChunkParser
nechunkparser able to train on a custom corpus. However, the format of training corpus was not specified in the documentation or the comment of the source code.
Where could I find some guide to the custom corpus for NER in NLTK?
- Stanford NER
According to the question, the FAQ of Stanford NER gives direction of how to train a custom NER model.
One of the major concern is that default Stanford NER does not support Indian. So is it viable to feed an Indian NER corpus to the model?