3

Whether sequencing is possible in uima ruta. For example:

Input File:

some text 
Fig 1.1
Table 1.1
Fig 1.2
some text
Pic 1.2
Table 1.2
some text
Table 1.3
Pic 1.3
some text
Fig 1.4
some text
Table 1.4
some text
Table 1.5
Fig 1.6
Box 1.1
Fig 1.5

How can I find the missing Figure(Fig 1.3)

2 Answers2

0

Here's an example how it could be done with UIMA Ruta 2.5.0.

Input text:

some text 
Fig 1.1
some text
Pic 1.2
some text
Pic 1.3
some text
Fig 1.4
some text

Rule script:

DECLARE FigureInd;
DECLARE FigureMention (INT chapter, INT section);

ACTION FM(INT chap, INT sect) = CREATE(FigureMention, "chapter" = chap, "section" = sect);

"Fig"-> FigureInd;

INT c, s;
(FigureInd NUM{PARSE(c)} PERIOD NUM{PARSE(s)}){-> FM(c,s)};

DECLARE FigMissing;
f1:FigureMention #{-> FigMissing} f2:FigureMention
    {f1.chapter == f2.chapter, f1.section < (f2.section - 1)};

INT pc, ps;
f:FigureMention{-> pc=f.chapter, ps=f.section} 
    FigMissing->{
    (ANY @NUM{PARSE(c)} PERIOD NUM{PARSE(s)}){c==pc,s==ps+1-> FM(c,s), pc=c, ps=s};
    };

Created FigureMention annotations:

Fig 1.1
Pic 1.2
Pic 1.3
Fig 1.4

A solution for UIMA Ruta 2.4.0 is very similar but does not allow direct usage of features of annotation label expressions. The values of these features need to be stored in additional variables. And one need to apply the boolean check after the setter of the variables. Here a solution for UIMA Ruta 2.4.0:

DECLARE FigureInd;
DECLARE FigureMention (INT chapter, INT section);

ACTION FM(INT chap, INT sect) = CREATE(FigureMention, "chapter" = chap, "section" = sect);

"Fig"-> FigureInd;

INT c, s;
(FigureInd NUM{PARSE(c)} PERIOD NUM{PARSE(s)}){-> FM(c,s)};

DECLARE FigMissing;
INT c1,c2,s1,s2;
(FigureMention<-{FigureMention{-> ASSIGN(c1, FigureMention.chapter), ASSIGN(s1, FigureMention.section)};} 
    #{-> FigMissing} 
    FigureMention<-{FigureMention{-> ASSIGN(c2, FigureMention.chapter), ASSIGN(s2, FigureMention.section)};}) 
    {c1 == (c2), s1 < (s2 - 1)};

INT pc, ps;
f:FigureMention{-> pc=FigureMention.chapter, ps=FigureMention.section} 
    FigMissing->{
    (ANY @NUM{PARSE(c)} PERIOD NUM{PARSE(s)}){c==(pc),s==(ps+1)-> FM(c,s), pc=c, ps=s};
    };

(DISCLAIMER: I am a developer of UIMA Ruta)

Peter Kluegl
  • 3,008
  • 1
  • 11
  • 8
  • UIMA Ruta 2.5.0 is currently under review and not yet released, but there is a staging repostory for the artiffacts [here](https://repository.apache.org/content/repositories/orgapacheuima-1097/) – Peter Kluegl Jun 15 '16 at 09:45
  • You mentioned that the values of these features need to be stored in additional variables. Can you plz mention how to do it ? The above script throws errors when used as such. – Sugunalakshmi Pagemajik Jun 22 '16 at 07:52
  • Yes, I will add another answer for UIMA Ruta 2.4.0 since UIMA Ruta 2.5.0 is delayed. – Peter Kluegl Jun 23 '16 at 09:26
  • Is it what you meant? Even if I'm right, plz help me with the right syntax ... STRING c1,c2,s1,s2; FigureMention{->GETFEATURE("chapter",c1)} #{-> FigMissing} FigureMention{->GETFEATURE("chapter",c2)} (c1==c2, s1<(s2-1)); – Sindhu Venkatachary Jul 01 '16 at 06:31
  • Yes, this leads in the correct way, but you need to execute the boolean cecks after the GETFEATURE actions are applied. See my the extended answer. – Peter Kluegl Jul 07 '16 at 09:08
  • I guess that Pic 1.3 Should not come under FigureMention. Because "Fig"-> FigureInd; – Sugunalakshmi Pagemajik Jul 07 '16 at 12:14
  • Why? That's the clue about the last rule. You first detect missing figs, then the last rule finds them without needing the FigureInd. – Peter Kluegl Jul 07 '16 at 15:45
0

The following script creates an annotation with min and max values of the missing figures in UIMA Ruta 2.4.0:

DECLARE FigureInd;
DECLARE FigureMention (INT chapter, INT section);
DECLARE FigureMissing (INT minChapter, INT minSection, INT maxChapter, INT maxSection);

ACTION Mention(INT chap, INT sect) = CREATE(FigureMention, "chapter" = chap, "section" = sect);
ACTION Missing(INT minc, INT mins, INT maxc, INT maxs) = CREATE(FigureMissing, "minChapter" = minc, "minSection" = mins, "maxChapter" = maxc, "maxSection" = maxs);

"Fig"-> FigureInd;

INT c, s;
(FigureInd NUM{PARSE(c)} PERIOD NUM{PARSE(s)}){-> Mention(c,s)};

DECLARE FigMissing;
INT c1,c2,s1,s2;
(FigureMention<-{FigureMention{-> ASSIGN(c1, FigureMention.chapter), ASSIGN(s1, FigureMention.section)};} 
    #{-> Missing(c1,s1+1,c2,s2-1)} 
    FigureMention<-{FigureMention{-> ASSIGN(c2, FigureMention.chapter), ASSIGN(s2, FigureMention.section)};}) 
    {c1 == (c2), s1 < (s2 - 1)};

In UIMA Ruta, there is no loop over boolean expressions (like while), only over existing annotations. This makes it more complicated to create separate annotations for each missing Fig on the same offset. However, it can be done with a recursive BLOCK. The script of the answer rather creates one annotation defining a range of missing figures.

For the text exmaple of the question, two FigureMissing annotations are created:

FigureMissing
- begin: 41
- end: 112
- minChapter: 1
- minSection: 3
- maxChapter: 1
- maxSection: 3

FigureMissing
- begin: 123
- end: 165
- minChapter: 1
- minSection: 5
- maxChapter: 1
- maxSection: 5

If the second FigureMissing should not be created, an additional rule could remove it again depending on the existing FigureMentions. This would of course be much simpler if separate FirgureMssing annotations would have been created, e.g., with a BLOCK.

DISCLAIMER: I am a developer of UIMA Ruta

Peter Kluegl
  • 3,008
  • 1
  • 11
  • 8