8

I am trying to figure out how to run the Clinical Document Pipeline from Java. I have a set of clinical documents as plain texts. I want to parse these documents and extract a list of that is in document doc_ID, there is CUI with frequency of freq. I spent several days installing cTAKES and looking for a solution. I narrow it down to ClinicalPipelineWithUmls.java where gets a test and runs SimplePipeline with a AnaylisisEngineDescription. Here is a part of the code:

String documentText = "Text of document to test goes here, such as the following. No edema, some soreness, denies pain.";
InputStream inStream = InputStreamCollectionReader.convertToByteArrayInputStream(documentText);
CollectionReader collectionReader = InputStreamCollectionReader.getCollectionReader(inStream);
AnalysisEngineDescription pipelineIncludingUmlsDictionaries = AnalysisEngineFactory.createAnalysisEngineDescription(
            "desc/analysis_engine/AggregatePlaintextUMLSProcessor");
AnalysisEngineDescription xWriter = AnalysisEngineFactory.createPrimitiveDescription(
            XWriter.class,
            XWriter.PARAM_OUTPUT_DIRECTORY_NAME,
            AssertionConst.evalOutputDir,
            XWriter.PARAM_XML_SCHEME_NAME,
            XWriter.XMI,
            XWriter.PARAM_FILE_NAMER_CLASS_NAME,
            CtakesFileNamer.class.getName());
SimplePipeline.runPipeline(collectionReader, pipelineIncludingUmlsDictionaries, xWriter);
System.out.println("Done at " + new Date());

The problem is it can not find "InputStreamCollectionReader". I searched for it but no success so far! Would you please give me a hint or show some directions? thanks for any help!

Franck Dernoncourt
  • 77,520
  • 72
  • 342
  • 501
user2600417
  • 81
  • 1
  • 4

2 Answers2

3

Is there any particular reason why you want to use InputStreamCollectionReader? Otherwise, there are examples on how to use TextReader here.

Renaud
  • 16,073
  • 6
  • 81
  • 79
  • 1
    Thank you for your response Renaud. Yes, I'm using cTAKES to extract UMLS CUI (Concept Unique Identifier) related to each word. I found this code in the cTAKES documentations. However, "InputStreamCollectionReader" can not be found. I'm new to this maven and Eclips world. Sorry if it is a stupid question! I appreciate any comments and hints. – user2600417 Oct 23 '13 at 16:56
  • 1
    Ok, have you tried to use `TextReader` instead? It should work for you. – Renaud Oct 24 '13 at 12:22
0

We have implemented a REST service for cTAKES that enables us to send clinical text as request and get back the analyzed output as JSON response.

You can have a look at the cTAKES REST module in the following github repo.I feel this should be the way to go for cTAKES users who are interested in web access.

Gandhi
  • 11,875
  • 4
  • 39
  • 63