1

I'm executing a ruta script dynamically from a Java Maven project. The script annotates an HTML file and the output is processed further. Now that the coveredText contains HTML tags in between as below;

(a+b) < SUP >2< /SUP> ==> is MARKed as formula

But I want it as

(a+b)2 ==> where the superscription is captured as another annotation and handled later.

How to arrive at the expected solution ?

Sindhu Venkatachary
  • 233
  • 1
  • 2
  • 10
  • In UIMA, the document text is static. If you want to change the text, you need to create a new view/CAS. In ruta, there are three components that can create a cas with modified document text: HtmlConverter, RutaModifier, RutaCutter. If you want to process it further, you need an aggregate AE with sofa mapping. – Peter Kluegl Jun 02 '16 at 08:18
  • How to do it ? Plz help me with some coding or links. Thanks ! – Sindhu Venkatachary Jun 03 '16 at 04:24

1 Answers1

1

In UIMA, the document text is static. If you want to change the text, you need to create a new view/CAS. In ruta, there are three components that can create a cas with modified document text: HtmlConverter, RutaModifier, RutaCutter. If you want to process it further in the same pipeline, you need an aggregate AE with sofa mapping (or a sofa aware analysis engine).

There is some documentation about these analysis engines and their usage. There is also an example project of these rules and and a StackOverflow question which discusses some possible problems. Information about Sofa mapping can be found in the UIMA documentation

(DISCLAIMER: I am a developer of UIMA Ruta)

Community
  • 1
  • 1
Peter Kluegl
  • 3,008
  • 1
  • 11
  • 8