4

I am new to JAPE (Java Annotation Pattern Engine), a part of GATE.

I have already made some rules in LHS that result some tags (say tag a, b, and c) in text.

My text consists of several parts, and I would like to classify each part based on generated tags.

As for illustration:

<record id=001>
lorem <a>ipsum</a> dolor sit amet
</record>
<record id=002>
consectetur <b>adipiscing</b> elit, sed do eiusmod <a>tempor</a> incididunt ut labore et dolore magna aliqua
</record>
<record id=003>
Ut enim ad minim veniam, quis <a>nostrud</a> exercitation <c>ullamco</c> laboris nisi ut aliquip ex ea commodo consequat.
</record>

As you can see that each record can contain more than one tags generated in LHS.

I would like to classify each record based on the tag inside it.

Say, if a record contains tag a, then classify it as A. If it contains a and b, classify it as A, assuming a is stronger than b.

I notice that I should manipulate this in RHS but I don't have an idea how to write this.

Could you please give me a clue or something?

Thanks.

Regards.

A. U.
  • 67
  • 5

1 Answers1

4

In order to build an if-else statement using JAPE grammar, it is not always required using Java in RHS.

When writing the rules, it can often be convenient to divide processing in multiple stages: each stage produces some results, which can then be passed to the next stages. So, based on what you just described, the data processing could be divided in the following three phases.

  1. RecordFinder, which returns the records within the document, ie Record annotations.
  2. TagFinder, which returns tags a and b within the document.
  3. Intersection: it searches for tags a and b within the records.

File Main.jape

MultiPhase: Main
Phases: 
RecordFinder
TagFinder
Intersection

File RecordFinder.jape

This phase is able to annotate the records in your document. The only rule of this JAPE file reads tokens (ie Token annotations returned by the tokeniser) as input and finds the records (ie tags record) within the document, and finally it returns Record annotations.

Note that, in the Options, the control is set as first, because the aim is to find the first occurrence of a sequence containing a token <record>, followed by one or more other tokens, followed by a token </record>.

Phase: RecordFinder
Input: Token
Options: control = first debug = true


// The following rule is able to find the sentence within a record
Rule: RuleToFindRecord
(
    ({Token.string == "<"} {Token.string == "record"} ({Token})* {Token.string == ">"})
    ({Token})*
    ({Token.string == "<"} {Token.string == "/"} {Token.string == "record"} {Token.string == ">"})
):match
-->
:match.Record = { rule = "RuleToFindRecord" }

File TagFinder.jape

This phase reads tokens as input and finds tags a and b within the text, and finally it returns a and b annotations.

Phase: TagFinder
Input: Token
Options: control = first debug = true


// The following rule is able to find the tag "a" within the document.
Rule: RuleToFindTag_a
(
    (
        ({Token.string == "<"} {Token.string == "a"} {Token.string == ">"})
        ({Token})*
        ({Token.string == "<"} {Token.string == "/"} {Token.string == "a"} {Token.string == ">"})
    )
    |
    ({Token.string == "<"} {Token.string == "a"} {Token.string == "/"} {Token.string == ">"})
):match
-->
:match.a = { rule = "RuleToFindTag_a" }


// The following rule is able to find the tag "b" within the document.
Rule: RuleToFindTag_b
(
    (
        ({Token.string == "<"} {Token.string == "b"} {Token.string == ">"})
        ({Token})*
        ({Token.string == "<"} {Token.string == "/"} {Token.string == "b"} {Token.string == ">"})
    )
    |
    ({Token.string == "<"} {Token.string == "b"} {Token.string == "/"} {Token.string == ">"})
):match
-->
:match.b = { rule = "RuleToFindTag_b" }

File Intersection.jape

This phase reads the annotations Record, a and b as input and searches for tags a or b within Record. Read this as reference about contains and within operators (I used one of these operators in the following rules).

Phase: Intersection
Input: Record a b
Options: control = first debug = true


// A record matches with this rule if it contains both tag a and tag b.
Rule: Rule_1
(
    {Record contains a, Record contains b}
):match
-->
:match.Record_with_both_tags = { rule = "Rule_1" }


// A record matches with this rule if it contains tag a.
Rule: Rule_2
(
    {Record contains a}
):match
-->
:match.Record_with_tag_a = { rule = "Rule_2" }
enzom83
  • 8,080
  • 10
  • 68
  • 114