3

I tried to mark paragraph which endswith space. In Ruta workspace SpaceBeforeEnter rule ran quickly. But in Java workspace same rule taking more time to excecute. Used versions as follows, uimaj-core version =>2.10.2, ruta-core version =>2.8.1 and ruta-core-ext version =>2.8.1 in both workspace. If I use the below in SCRIPT1, I'm not facing Time issue.

Rule

DECLARE SpaceBeforeEnter;
          RETAINTYPE(BREAK,SPACE,MARKUP); 
          SPACE+?{-PARTOF(Footnote_Block),-PARTOF(Endnote_Block),-PARTOF(TAB),-PARTOF(SpaceBeforeEnter)} MARKUP*?{-PARTOF(Math),-PARTOF(IMG)->MARK(SpaceBeforeEnter,1)} @BREAK; 
          RETAINTYPE; 

Code Snippet:

Initialise Engine and Run Process:

        Ruta script1Ruta = callRuta(Path.Engine.SCRIPT1.ordinal());
        if (script1Ruta.exception == null) {
            script1Ruta.run();  
                 if (script1Ruta.exception != null){
                     throw new Exception("Failed.");
                 }}
        else {
            throw new Exception("Failed.");
        }           
        try {           
        Ruta script2Ruta = callRuta(script1Ruta.getCasIndex(), Path.Engine.SCRIPT2.ordinal());
        script2Ruta.run();  
        }catch(Exception e){
            logger.error(e.getMessage());
        }

CallRuta():

private Ruta callRuta(int script1CasIndex, int engine) {
    int[] engines = new int[1];
    engines[0] = engine;
    return (new Ruta(script1CasIndex, engines));
}

private Ruta callRuta(int engine) {
    int[] engines = new int[1];
    engines[0] = engine;
    return (new Ruta(engines));
}

public Ruta(int casIndex, int[] engines) {
    createEngines(engines.length);
    this.engineIndex[0] = engines[0];
    this.casIndex = casIndex;
    if (casList.size() > casIndex) {
        cas = casList.get(casIndex);
    }
    try {
        File engineFile = Path.getEngineFiles(engineIndex[0]).get(0);
        engine[0] = runScript(engineFile);
        if (cas == null) { // Create new CAS for first script or if cas=null
            cas = engine[0].newCAS();
            artifact = readFile(Path.getXhtmlFile().getAbsolutePath());
            cas.setDocumentText(artifact);
            casList.add(cas);
        } else {
            logger.info("Existing CAS is used: " + casIndex);
        }
    } catch (Exception e) {
        logger.error(e.getMessage());
    }
    
}

run():

public void run()  {
    try {
        engine[0].process(cas);
        if (cas == null) {
            logger.debug("OUPUT CAS is null");
        } else {
            logger.debug("OUTPUT CAS is NOT null");
        }   
        NoException = true;
    } catch (Exception e) {
        logger.error(e.getMessage());
    }}

runScript():

private AnalysisEngine runScript(File engineFile) throws Exception {

    String path = new File(file.toURI()).getParentFile().getAbsolutePath();
    String[] pathsArray = new String[] { path };

    // override the values in the descriptor when creating the description
    AnalysisEngineDescription desc = AnalysisEngineFactory.createEngineDescriptionFromPath(file.getAbsolutePath(),
            RutaEngine.PARAM_SCRIPT_PATHS, pathsArray, RutaEngine.PARAM_DESCRIPTOR_PATHS, pathsArray,
            RutaEngine.PARAM_RESOURCE_PATHS, pathsArray);

    // in case the location of the descriptor is not known...
    URL sourceUrl = desc.getSourceUrl();
    path = new File(sourceUrl.toURI()).getParentFile().getAbsolutePath();
    pathsArray = new String[] { path };

    // set the values in the description
    ConfigurationParameterSettings settings = desc.getAnalysisEngineMetaData().getConfigurationParameterSettings();
    settings.setParameterValue(RutaEngine.PARAM_SCRIPT_PATHS, pathsArray);
    settings.setParameterValue(RutaEngine.PARAM_DESCRIPTOR_PATHS, pathsArray);
    settings.setParameterValue(RutaEngine.PARAM_RESOURCE_PATHS, pathsArray);

    // override the values in the descriptor when creating the analysis
    // engine
    AnalysisEngine ae = AnalysisEngineFactory.createEngine(desc, RutaEngine.PARAM_SCRIPT_PATHS, pathsArray,
            RutaEngine.PARAM_DESCRIPTOR_PATHS, pathsArray, RutaEngine.PARAM_RESOURCE_PATHS, pathsArray);

    // set the values in the analysis engine and reconfigure it
    ae.setConfigParameterValue(RutaEngine.PARAM_SCRIPT_PATHS, pathsArray);
    ae.setConfigParameterValue(RutaEngine.PARAM_DESCRIPTOR_PATHS, pathsArray);
    ae.setConfigParameterValue(RutaEngine.PARAM_RESOURCE_PATHS, pathsArray);

    ae.reconfigure();
    return ae; //returns Analysis Engine
}
  • 1
    How do you invoke UIMA/Ruta from Java? – rec Dec 13 '20 at 07:42
  • @rec Updated the question. – Sugunalakshmi Pagemajik Dec 14 '20 at 04:52
  • The code you added initializes the Ruta engine and returns it. Although that should be fairly quick, it can become slow e.g. if you run a loop over your documents and invoke `runScript` for each single document. Also, things can slow down if you create a new (J)Cas object for every document. Are you re-using the engine and the CAS? – rec Dec 15 '20 at 06:53
  • @rec Yes, I'm reusing the CAS. Updated the question. I'm also facing the same problem for some more rules. One similarity is MARKUP is retained for all the rules. – Sugunalakshmi Pagemajik Dec 15 '20 at 17:26
  • Whether its due any ConfigurationParameterSettings? – Sugunalakshmi Pagemajik Dec 15 '20 at 17:37
  • Can you check if the same seeders are used. The default seeder (MARKUP) is much slower than the text seeder for xhtml. Can you check if there are debug params activated? There there a larger difference between the typesystem in classpath and those available in Ruta Workbench? – Peter Kluegl Dec 16 '20 at 13:56
  • Regardless of this problem: '+?' and '*?' are rather slow and should be avoided if runtime performance is important. – Peter Kluegl Dec 16 '20 at 14:03
  • In both script engine org.apache.uima.ruta.seed.DefaultSeeder is used. debug params set to false and debugWithMatches params set to true. – Sugunalakshmi Pagemajik Dec 17 '20 at 04:17
  • '+?' and '*?' are rather slow and should be avoided if runtime performance is important. => What is the alternative quantifier? But in my script if I use '+?' and '*?', then only some of the rules ran quickly. – Sugunalakshmi Pagemajik Dec 17 '20 at 04:33
  • It is faster to use + with an additional condition instead of +?. Example ANY+{-PARTOF(NUM)} NUM instead of ANY+? NUM – Peter Kluegl Feb 25 '21 at 13:09

0 Answers0