There are several aspects to consider here. In general, UIMA Ruta does not support fuzziness in the dictionary lookup. SCORE
and MARKSCORE
are language elements which can be utilized to introduce some heurstic scoring (not really fuzziness) in sequential rules. In the examples you gave in your question, you do not really need fuzzy matching.
The dictionary lookup in UIMA Ruta works on the RutaBasic annotation. These annotations are automatically created and maintained by UIMA Ruta itself (and should not be changed by other analysis engines or rules directly). The RutaBasic annotations represent the smallest fragments annotations are referring to. By default, the seeder of the RutaEngine creates annotations for words (W
-> CW
, SW
, CAP
) and many other tokens like SPECIAL
for - or $. This means that there is also a RutaBasic annotation, and that the dictionary lookup can distinghish between these tokens. As a result, Tiger and Monkey should be annotated and the example in your question should actually work (I tested it). You maybe need some postprossesing in order to include the SPECIAL
in ANIMAL
.
I have to mention that there is also the functionality to use an edit distance in the dictionary lookup (Multi Tree Word List, TRIE). However, this functionality has not been maintained for several years. It should also support different weights for specific replacements. I do not know if this counts as fuzziness.
DISCLAIMER: I am a developer of UIMA Ruta