0

I have a web app which uses tika-python, it works fine and each time I start it, it downloads two files "tika-server.jar" and "tika-server.jar" to local and parses files. But sometimes its unable to download those files so this service doesn't work at all.

I have downloaded both files to ./temp and want use those files and don't want to download again and again which takes a lot of times and sometimes doesn't work.

I have tried docker compose but thats also not working, so far my docker file

FROM python:3.8-slim
WORKDIR /app
COPY ./templates /app/templates
COPY ./temp /app/temp
COPY ./app.py /app/app.py
COPY ./requirements.txt /app/requirements.txt
RUN pip install --no-deps --no-cache-dir -r requirements.txt && \
    apt-get update && \
    DEBIAN_FRONTEND=noninteractive \
    apt-get -y install default-jre-headless && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*
#ENV
ENV TIKA_SERVER_JAR = ./temp/tika-server.jar
ENV TIKA_PATH = ./temp
#PORT
EXPOSE 5000

# configure the container to run in an executed manner
ENTRYPOINT [ "python", "app.py" ]

My python app.py script

import os
from tika import parser
os.environ['TIKA_SERVER_JAR'] = './temp/tika-server.jar'
os.environ['TIKA_PATH'] = './temp'
text = parser.from_file(file, service='text')['content']

everything works when I don't want to use this offline but when I want to use local files nothing works. I have tried different combination of env variables. I am new to docker and linux commands.

Any help will be appreciated.

User's Environment variable: {'GPG_KEY': 'E3FF2839C048B25C084DEBE9B2*************68', 'HOME': '/root', 'HOSTNAME': '2d43d*****', 'LANG': 'C.UTF-8', 'PATH': '/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin', 'PYTHON_GET_PIP_SHA256': '5aefe6ade911d997af080b315ebcb7f882212d070465df544e1175ac2be519b4', 'PYTHON_GET_PIP_URL': 'https://github.com/pypa/get-pip/raw/5eaac1050023df1f5c98b173b248c260023f2278/public/get-pip.py', 'PYTHON_PIP_VERSION': '22.0.4', 'PYTHON_SETUPTOOLS_VERSION': '57.5.0', 'PYTHON_VERSION': '3.8.13', 'TIKA_PATH': './my_project', 'TIKA_SERVER_JAR': './my_project/tika-server.jar'}

Garuda
  • 46
  • 7
  • That's all your script? – Klaus D. Sep 05 '22 at 05:20
  • Do you have any logs from the failed container? `docker logs ` – Michal Racko Sep 05 '22 at 06:30
  • 2022-09-05 10:12:24,790 [Thread-3 ] [WARNI] Failed to see startup log message; retrying... 2022-09-05 10:12:29,796 [Thread-3 ] [WARNI] Failed to see startup log message; retrying... 2022-09-05 10:12:34,800 [Thread-3 ] [WARNI] Failed to see startup log message; retrying... 2022-09-05 10:12:39,804 [Thread-3 ] [ERROR] Tika startup log message not received after 3 tries. 2022-09-05 10:12:39,805 [Thread-3 ] [ERROR] Failed to receive startup confirmation from startServer. ERROR: Unable to start Tika server. – Garuda Sep 05 '22 at 07:58
  • Hello Kluus D, I am working on a NLP project that is working as expected. I just want to use this text output. This output needs only a few lines from tika import parser text = parser.from_file(file, service='text')['content'] Thats all I am stuck with. – Garuda Sep 05 '22 at 08:00

0 Answers0