Questions tagged [uima]

UIMA (Unstructured Information Management Architecture) is an architecture for creating scalable applications that analyze and extract information from unstructured data sources such as text, audio, and video. Apache UIMA is an open-source Java framework implementing the UIMA architecture. UIMA applications typically use natural language processing (NLP) techniques to perform analysis.

UIMA (Unstructured Information Management Architecture) is an architecture for creating scalable applications that analyze and extract information from unstructured data sources such as text, audio, and video. UIMA is specified in an OASIS standard. Apache UIMA is an open-source Java framework implementing the UIMA architecture. Apache UIMA is based on code open-sourced by IBM. UIMA was a central part of Jeopardy-playing IBM's Watson computer. UIMA applications typically use natural language processing (NLP) techniques to perform analysis.

UIMA defines applications as Collection Processing Engines (CPEs). Each CPE includes a Collection Reader (CR), one or more Analysis Engines (AE), and optionally a CAS Consumer.

A Collection is a repository of data to be analyzed, and it may take a number of forms, including RDBMS tables, a schema-less database, or a set of files on a filesystem. The first component in a CPE is the Collection Reader, which reads in pieces of data from the the Collection and packages the pieces in a data structured called the Common Analysis Structure (CAS). Collections can be stored in many ways, including RDBMS tables, schema-less databases, and files on a filesystem.

The CR passes CAS objects on to the first Analysis Engine in the pipeline. Each AE analyzes the information artifact packaged in a CAS, constructs annotations from the results of the analysis (e.g. parts of speech for words or phrases), and adds these annotations to the CAS before passing it on downstream. At the end of the pipeline, a CAS Consumer does something useful with the annotations, such as writing them to a database, or to files, or adding them to a semantic search index. Since version 2 of UIMA, the Apache UIMA documentation recommends using Analysis Engines instead of CAS Consumers, since AEs possess all of the required functionality for consuming CAS objects.

Each UIMA component has a descriptor in XML that defines its behavior and parameters. The descriptor for a Collection Processing Engine refers to the descriptors of each of its components and overrides their settings if desired.

UIMA supports conditional flow control, such that an annotation made in a CAS can determine which branch of a pipeline it takes downstream.

UIMA Asynchronous Scaleout is an add-on that enables a UIMA application to run many instances of an Analysis Engine to support higher throughput.

418 questions
17
votes
3 answers

Accuracy: ANNIE vs Stanford NLP vs OpenNLP with UIMA

My work is planning on using a UIMA cluster to run documents through to extract named entities and what not. As I understand it, UIMA have very few NLP components packaged with it. I've been testing GATE for awhile now and am fairly comfortable…
Drag
  • 171
  • 1
  • 4
15
votes
1 answer

How apache UIMA is different from Apache Opennlp

I have been doing some capability testing with Apache OpenNLP, Which has the capability to Sentence detection, Tokenization, Name entity recognition. Now when i started looking at UIMA documents it is mentioned on the UIMA home page - "language…
vashishth
  • 2,751
  • 4
  • 38
  • 68
9
votes
1 answer

Uima Ruta Out of Memory issue in spark context

I'm running an UIMA application on apache spark. There are million of pages coming into batches to be processed by UIMA RUTA for calculation. But some time i'm facing out of memory exception.It throws exception sometime as it successfully process…
Gaurav
  • 139
  • 1
  • 16
8
votes
2 answers

using cTAKES to parse clinical documents

I am trying to figure out how to run the Clinical Document Pipeline from Java. I have a set of clinical documents as plain texts. I want to parse these documents and extract a list of that is in document doc_ID, there is CUI with frequency of…
user2600417
  • 81
  • 1
  • 4
6
votes
2 answers

Examples for using Apache UIMA in a java program

I have been searching for examples of using Apache UIMA in a java program. Are there examples on how to use the example Annotators in a Java program ?
Arun R
  • 8,372
  • 6
  • 37
  • 46
6
votes
2 answers

SBT can't resolve dependency that exists on Sonatype repo

I'm attempting to include a dependency known as uimascala in my project. It's available on the Sonatype repository, but for some reason SBT won't can't find it. Here's my build.sbt. val sparkCore = "org.apache.spark" % "spark-core_2.10" %…
Sean Glover
  • 1,766
  • 18
  • 31
6
votes
2 answers

Java API for running UIMA Ruta scripts

I am new to UIMA Ruta. I made some annotators using scripting language. I am able to run them within EclipseIDE. I want to write a JAVA API to automatically run scripts on the input provided. I am using the same example project provided in UIMA…
Anshul
  • 83
  • 8
6
votes
3 answers

Accessing annotations in UIMA

Is there a way in UIMA to access the annotations from the tokens like the same way they do in their CAS debugger GUI?. You can of course access all the annotations from the index repository, but i want to loop on the tokens, and get all associated…
Shady Hussein
  • 513
  • 8
  • 24
5
votes
3 answers

UIMA Example in Eclipse not working

I'm new to Eclipse and UIMA. I'm trying to run UIMA examples, in Eclipse Luna -j2ee platform. I can run cvd.sh from terminal in examples. When I try to run examples from "Run Configurations", I encounter error as below : Error: Could not find or…
5
votes
1 answer

Is UIMA provides only a wrapper or is it like StandfordCore NLP and GATE?

The Standford Core NLP and the GATE provides the various NLP operation like NER, POS tagging. There are some of the NLP operation like Tokenizer, Snowball Stemmer available as a UIMA component. So, Is UIMA comparable with the StandfordCore NLP/GATE…
Gaurav
  • 531
  • 1
  • 4
  • 15
5
votes
0 answers

Get next word (or POS) suggestion for a given sentence. Autocomplete a sentence

I have to implement auto-suggestion feature in my desktop based java application. The requirement is as follow: A user will give a sentence as input and i have to return the next possible Part-Of-Speech as suggestion. Eg: 1. UserInput: Mike wants…
thekosmix
  • 1,705
  • 21
  • 35
5
votes
3 answers

How to remove UIMA annotations?

I'm using some UIMA annotators in a pipeline. It run tasks like: tokenizer sentence splitter gazetizer My Annotator The problem is that I don't want to write ALL the annotations (Token, Sentence, SubToken, Time, myAnnotations, etc..) to the…
German Attanasio
  • 22,217
  • 7
  • 47
  • 63
5
votes
1 answer

How to create an AnalysisEngineDescriptor from an uima-ruta script to use in a SimplePipeline

I'm not able to run an uima ruta script in my simple pipeline. I'm working with the next libraries: Uimafit 2.0.0 Uima-ruta 2.0.1 ClearTK 1.4.1 Maven And I'm using a org.apache.uima.fit.pipeline.SimplePipeline with: SimplePipeline.runPipeline( …
German Attanasio
  • 22,217
  • 7
  • 47
  • 63
4
votes
1 answer

Fuziness In UIMA ruta

Is there any option of fuzziness in case of word matching, or ignoring some special cases. For ex: STRINGLIST AMIMALLIST = {"LION","TIGER","MONKEY"}; DECLARE ANIMAL; Document {-> MARKFAST(ANIMAL, AMIMALLIST, true)}; I need to match words with…
Gaurav
  • 139
  • 1
  • 16
4
votes
1 answer

Document is ambiguous, use one of the following instead : org.apache.uima.ruta.type.Document uima.tcas.DocumentAnnotation

I'm using Ruta annotation framework for annotating the input text previously I was using Ruta script from classpath. But according to client requirement we have to move out Ruta script outside the code all this need to be decouple from the system.…
Gaurav
  • 139
  • 1
  • 16
1
2 3
27 28