4

We are having trouble to retrieve a data from a XMI file. The following excerpt illustrates an example of what we are trying to do:

<uima:Token xmi:id="28" sofa="1" begin="3" end="6" pos="v-fin" features="PR=1S=IND" lexeme="sou">
    <lemma>ser</lemma>
</uima:Token>

We know how to obtain the information contained in the first line, such as the id, begin, sofa and so on (these are attributes), which can be retrieved using the following code:

IMPORT opennlp.uima.Token FROM TypeSystem AS cgToken;
// ...
cgToken{REGEXP(cgToken.lexeme, "sou", true) -> DO_SOME_ACTION};
// do some action if the lexeme is "sou"

However, as we said, we want to know how to obtain the lemma (the string "ser"), that is in a child tag, in the previous example.

Obviously, we have tried cgToken{REGEXP(cgToken.lemma, "ser", true) -> DO_SOME_ACTION};, which does not work because lemma is not an attribute of cgToken. Furthermore, there may be more than one lemma inside a single cgToken.

The TypeSystem defines this feature as follows:

<featureDescription>
    <name>lemma</name>
    <description>lemma</description>
    <rangeTypeName>uima.cas.StringArray</rangeTypeName>
</featureDescription>

However, Ruta documentation does not explain how to access an array field.

Cogroo
  • 41
  • 1
  • 1
    Operations on StringArrays or any other UIMA array are hardly supported in UIMA Ruta 2.2.1 (current release). Until the langauge is extended in a future release, this functionality can only be achieved by using an adapted/extended type system, language extensions (e.g., a new condition) or additional analysis engines. – Peter Kluegl May 20 '15 at 08:47

0 Answers0