In GATE, you don't have to worry about reading and converting common files like .txt, .pdf, .html, etc. GATE automatically does that.
Initialize GATE like this:
private static void initGateApplication(String gateXgappFileLoc, String gateHome) {
try {
try {
if (Gate.getGateHome() == null)
Gate.setGateHome(new File(gateHome));
}
catch (Exception ex) {
ex.printStackTrace(System.out);
}
try {
if (!Gate.isInitialised())
Gate.init();
}
catch (GateException e) {
e.printStackTrace(System.out);
}
System.out.println("Initializing gate application...");
gappFile = new File(gateXgappFileLoc);
gateApplication = (CorpusController) PersistenceManager.loadObjectFromFile(gappFile);
}
catch (Exception e) {
e.printStackTrace(System.out);
}
}
And run your GATE pipeline with your text file:
public void extract(String inputFileName, String docID, CorpusController gateApplication) throws GateException, IOException
{
CorpusController application = gateApplication;
Corpus corpus = Factory.newCorpus("Sample Corpus");
application.setCorpus(corpus);
File docFile = new File(inputFileName);
System.out.print("Processing document " + docFile + "...");
Document doc = Factory.newDocument(docFile.toURL(), encoding);
// add document to the corpus
corpus.add(doc);
// run the application
application.execute();
System.out.println("Done running GATE pipeline...");
// Now use get annotations from 'doc' object
}