The following script creates an annotation with min and max values of the missing figures in UIMA Ruta 2.4.0:
DECLARE FigureInd;
DECLARE FigureMention (INT chapter, INT section);
DECLARE FigureMissing (INT minChapter, INT minSection, INT maxChapter, INT maxSection);
ACTION Mention(INT chap, INT sect) = CREATE(FigureMention, "chapter" = chap, "section" = sect);
ACTION Missing(INT minc, INT mins, INT maxc, INT maxs) = CREATE(FigureMissing, "minChapter" = minc, "minSection" = mins, "maxChapter" = maxc, "maxSection" = maxs);
"Fig"-> FigureInd;
INT c, s;
(FigureInd NUM{PARSE(c)} PERIOD NUM{PARSE(s)}){-> Mention(c,s)};
DECLARE FigMissing;
INT c1,c2,s1,s2;
(FigureMention<-{FigureMention{-> ASSIGN(c1, FigureMention.chapter), ASSIGN(s1, FigureMention.section)};}
#{-> Missing(c1,s1+1,c2,s2-1)}
FigureMention<-{FigureMention{-> ASSIGN(c2, FigureMention.chapter), ASSIGN(s2, FigureMention.section)};})
{c1 == (c2), s1 < (s2 - 1)};
In UIMA Ruta, there is no loop over boolean expressions (like while), only over existing annotations. This makes it more complicated to create separate annotations for each missing Fig on the same offset. However, it can be done with a recursive BLOCK. The script of the answer rather creates one annotation defining a range of missing figures.
For the text exmaple of the question, two FigureMissing annotations are created:
FigureMissing
- begin: 41
- end: 112
- minChapter: 1
- minSection: 3
- maxChapter: 1
- maxSection: 3
FigureMissing
- begin: 123
- end: 165
- minChapter: 1
- minSection: 5
- maxChapter: 1
- maxSection: 5
If the second FigureMissing should not be created, an additional rule could remove it again depending on the existing FigureMentions. This would of course be much simpler if separate FirgureMssing annotations would have been created, e.g., with a BLOCK.
DISCLAIMER: I am a developer of UIMA Ruta