0

I'm a Java beginner and I need to do the following: - I have a txt file as input with text that I want to analyse in GATE; - I want to get GATE to start automatically and run its linguistic analysis (Corpus Pipeline) on this text.

My idea is to open and read the txt file in Java and then convert it to a GATE doc, but I have the following doubts:

1) how do I convert the text to a GATE doc?

2) how do I get GATE to start automatically?

Thanks for helping me out.

user3729787
  • 121
  • 2
  • 11
  • You can take a look at similar questions http://stackoverflow.com/questions/2171469/run-gate-pipeline-from-inside-a-java-program-without-the-gui-build-a-tomcat-app And there are examples on the GATE site itself. – Alex P Jul 01 '14 at 14:18

1 Answers1

0

In GATE, you don't have to worry about reading and converting common files like .txt, .pdf, .html, etc. GATE automatically does that.

Initialize GATE like this:

private static void initGateApplication(String gateXgappFileLoc, String gateHome) {
        try {           
            try {
                if (Gate.getGateHome() == null)
                    Gate.setGateHome(new File(gateHome));
            } 
            catch (Exception ex) {
                ex.printStackTrace(System.out);
            }           
            try {
                if (!Gate.isInitialised())
                    Gate.init();
            } 
            catch (GateException e) {
                e.printStackTrace(System.out);
            }
            System.out.println("Initializing gate application...");
            gappFile = new File(gateXgappFileLoc);
            gateApplication = (CorpusController) PersistenceManager.loadObjectFromFile(gappFile);
        } 
        catch (Exception e) {
            e.printStackTrace(System.out);
        }
    }

And run your GATE pipeline with your text file:

public void extract(String inputFileName, String docID, CorpusController gateApplication) throws GateException, IOException 
{

      CorpusController application = gateApplication;

      Corpus corpus = Factory.newCorpus("Sample Corpus");
      application.setCorpus(corpus);

      File docFile = new File(inputFileName);
      System.out.print("Processing document " + docFile + "...");
      Document doc = Factory.newDocument(docFile.toURL(), encoding);

      // add document to the corpus
      corpus.add(doc);

      // run the application
      application.execute();
      System.out.println("Done running GATE pipeline...");
      // Now use get annotations from 'doc' object
}
Ramanan
  • 1,000
  • 1
  • 7
  • 20