Fuziness In UIMA ruta

Question

Is there any option of fuzziness in case of word matching, or ignoring some special cases.

For ex:

STRINGLIST AMIMALLIST = {"LION","TIGER","MONKEY"};
DECLARE ANIMAL;


Document {-> MARKFAST(ANIMAL, AMIMALLIST, true)};

I need to match words with list in case I face some special character like

Tiger- or MONKEY$

According to documentation There are different evaluator any idea how to use? Or can I use SCORE or MARKSCORE

@PeterKluegl can you help here? – Gaurav Aug 18 '17 at 05:16 — Gaurav, Aug 18 '17 at 05:16
Yes,I'll add an answer the next days. – Peter Kluegl Aug 20 '17 at 19:27 — Peter Kluegl, Aug 20 '17 at 19:27

Peter Kluegl · Answer 1 · 2017-08-25T21:02:35.683

There are several aspects to consider here. In general, UIMA Ruta does not support fuzziness in the dictionary lookup. SCORE and MARKSCORE are language elements which can be utilized to introduce some heurstic scoring (not really fuzziness) in sequential rules. In the examples you gave in your question, you do not really need fuzzy matching.

The dictionary lookup in UIMA Ruta works on the RutaBasic annotation. These annotations are automatically created and maintained by UIMA Ruta itself (and should not be changed by other analysis engines or rules directly). The RutaBasic annotations represent the smallest fragments annotations are referring to. By default, the seeder of the RutaEngine creates annotations for words (W -> CW, SW, CAP) and many other tokens like SPECIAL for - or $. This means that there is also a RutaBasic annotation, and that the dictionary lookup can distinghish between these tokens. As a result, Tiger and Monkey should be annotated and the example in your question should actually work (I tested it). You maybe need some postprossesing in order to include the SPECIAL in ANIMAL.

I have to mention that there is also the functionality to use an edit distance in the dictionary lookup (Multi Tree Word List, TRIE). However, this functionality has not been maintained for several years. It should also support different weights for specific replacements. I do not know if this counts as fuzziness.

DISCLAIMER: I am a developer of UIMA Ruta

Fuziness In UIMA ruta

1 Answers1