Data Partition in Supervised Machine Learning

Asked Apr 26 '16 at 17:44

Active Apr 26 '16 at 17:44

Viewed 273 times

I have tried to train one Hidden Markov Model(HMM) tagger to extract some user defined entities. I am trying to run one classifier to extract various relationships and resolving ambiguity of the extracted entities. In both these supervised algorithms I have kept 80% of the data for training, and 20% for testing. I am not comparing any model performance so not keeping any data for validation or cross validation. Am I fine? I tried to read some materials like, Stackexchange Post, Previous post1,Previous Post2 and a Wikipedia Article

edited May 23 '17 at 11:46

Community

asked Apr 26 '16 at 17:44

Coeus2016

fine with what? And if you are not comparing anything then why you have test set at all? – lejlot Apr 26 '16 at 18:14
I meant to say is my approach fine? In test data I am comparing performance of chosen model with training data. By comparison I tried to mean performance comparison among various models. – Coeus2016 Apr 26 '16 at 18:21
in order to answer if the setting is ok we will need exact description of what you do and what you are **trying to achieve** – lejlot Apr 26 '16 at 18:33
I have user defined labels. I am trying to train one HMM tagger on it as if to extract NERs. Suppose the tagged data is tagged with PERS and LOC in one document, I am trying to train one simple multiclass classifier say NB, with these tagged entity base strings labeled with relation like PERS-LOC, I am now training the NB on this new data to extract relations. – Coeus2016 Apr 26 '16 at 19:17

Data Partition in Supervised Machine Learning

0 Answers0