I'm running tika-server-1.23.jar with tesseract and extracting text from files using curl via php. Sometimes it takes too long to run with OCR so I'd like, occasionally, to exclude running tesseract. I can do this by inserting
<parser-exclude class="org.apache.tika.parser.ocr.TesseractOCRParser"/>
in the tika config xml file but this means it never runs tesseract.
Can I force the tika server to skip using tesseract selectively at each request via curl and, if so, how?
I've got a workaround where I'm running two instances of the tika server each with a different config file listening on different ports but this is sub-optimal.
Thanks in advance.