cTAKES parser output

Question

I am trying to understand the result generated via cTAKES parser. I am unable to understand certain points-

cTAKES parser is invoked via TIKa-app we get following result-

ctakes:AnatomicalSiteMention: liver:77:82:C1278929,C0023884
ctakes:ProcedureMention: CT scan:24:31:C0040405,C0040405,C0040405,C0040405
ctakes:ProcedureMention: CT:24:26:C0009244,C0009244,C0040405,C0040405,C0009244,C0009244,C0040405,C0009244,C0009244,C0009244,C0040405
ctakes:ProcedureMention: scan:27:31:C0034606,C0034606,C0034606,C0034606,C0441633,C0034606,C0034606,C0034606,C0034606,C0034606,C0034606
ctakes:RomanNumeralAnnotation: did:47:50:
ctakes:SignSymptomMention: lesions:62:69:C0221198,C0221198
ctakes:schema: coveredText:start:end:ontologyConceptArr
resourceName: sample

and document parsed contains -

The patient underwent a CT scan in April which did not reveal lesions in his liver

i have following questions-

why UMLS id is repeated like in ctakes:ProcedureMention: scan:27:31:C0009244,C0009244,C0040405,C0040405,C0009244,C0009244,C0040405,C0009244,C0009244,C0009244,C0040405? (cTAKES configuration properties file has annotationProps=BEGIN,END,ONTOLOGY_CONCEPT_ARR)
what does RomanNumeralAnnotation indicate?
In concept unique identifier like C0040405, do these 7 numbers have any meaning. How are these generated?

System information:

Apache tika 1.10

Apache ctakes 3.2.2

cTAKES parser output

0 Answers0