I'm literally going crazy and pulling my hair out because I can't seem to solve this particular problem.
So here's the problem: I have two containers: Django and celery. The user uploads a word document and the celery worker converts that word document to pdf and uploads to a s3 bucket. I'm using libreoffice --headless
to convert it. So a user sends the file to an API endpoints and saves the word document in a folder called original
and celery calls convert_office_to_pdf.delay
which needs to convert the file and put it into another folder converted
. Everything is working as intended apart from the celery function. This is how the code looks:
import subprocess
def convert_office_to_pdf(original_file):
ws = websocket.WebSocket()
ws.connect('ws://web:8000/ws/converter/public/')
#how the command will look like
print('libreoffice --headless --convert-to pdf original/{} --outdir ./converted'.format(original_file))
subprocess.call('libreoffice --headless --convert-to pdf original/{} --outdir ./converted'.format(original_file), shell=True)
ws.send(json.dumps({
'message': '{}.pdf'.format(pure_file_name),
'progress': 75}))
upload_file_to_s3(pure_file_name, 'pdf', ws)
However, the function get's executed and nothing happens. This is output from docker-compose
web_1 | [2018/03/22 22:57:52] HTTP GET /converter/ 200 [0.06, 172.17.0.1:32788]
web_1 | [2018/03/22 22:57:52] HTTP GET /static/css/normalize.css 304 [0.02, 172.17.0.1:32788]
web_1 | [2018/03/22 22:57:52] WebSocket HANDSHAKING /ws/converter/public/ [172.17.0.1:32798]
web_1 | [2018/03/22 22:57:52] WebSocket CONNECT /ws/converter/public/ [172.17.0.1:32798]
fileshiffty_data_1 exited with code 0
worker_1 | [2018-03-22 22:58:04,413: INFO/MainProcess] Received task: api.tasks.convert_office_to_pdf[287805aa-3c9c-4212-92d4-cac5872076f2]
worker_1 | [2018-03-22 22:58:04,414: DEBUG/MainProcess] TaskPool: Apply <function _fast_trace_task at 0x7fb72d567e18> (args:('api.tasks.convert_office_to_pdf', '287805aa-3c9c-4212-92d4-cac5872076f2', {'lang': 'py', 'task': 'api.tasks.convert_office_to_pdf', 'id': '287805aa-3c9c-4212-92d4-cac5872076f2', 'eta': None, 'expires': None, 'group': None, 'retries': 0, 'timelimit': [None, None], 'root_id': '287805aa-3c9c-4212-92d4-cac5872076f2', 'parent_id': None, 'argsrepr': "('1521759484.3458297-Doc1.docx',)", 'kwargsrepr': '{}', 'origin': 'gen8@a478d8966021', 'reply_to': 'adf32365-ef93-327e-842f-7eff10fda37a', 'correlation_id': '287805aa-3c9c-4212-92d4-cac5872076f2', 'delivery_info': {'exchange': '', 'routing_key': 'celery', 'priority': 0, 'redelivered': None}}, b'[["1521759484.3458297-Doc1.docx"], {}, {"callbacks": null, "errbacks": null, "chain": null, "chord": null}]', 'application/json', 'utf-8') kwargs:{})
web_1 | [2018/03/22 22:58:04] HTTP PUT /api/v1/fileupload/word/pdf/ 200 [0.07, 172.17.0.1:32788]
worker_1 | [2018-03-22 22:58:04,417: DEBUG/MainProcess] Task accepted: api.tasks.convert_office_to_pdf[287805aa-3c9c-4212-92d4-cac5872076f2] pid:9
web_1 | [2018/03/22 22:58:04] WebSocket HANDSHAKING /ws/converter/public/ [172.17.0.2:58928]
web_1 | [2018/03/22 22:58:04] WebSocket CONNECT /ws/converter/public/ [172.17.0.2:58928]
worker_1 | [2018-03-22 22:58:04,426: WARNING/ForkPoolWorker-2] /data/web/fileshiffty
worker_1 | [2018-03-22 22:58:04,427: WARNING/ForkPoolWorker-2] libreoffice --headless --convert-to pdf original/1521759484.3458297-Doc1.docx --outdir ./converted
web_1 | {"message": "1521759484.3458297-Doc1.pdf", "progress": 50}
web_1 | {"message": "1521759484.3458297-Doc1.pdf", "progress": 75}
When I upload the file I can confirm that the file is added to the original
folder and the log entry worker_1 | [2018-03-22 22:58:04,427: WARNING/ForkPoolWorker-2] libreoffice --headless --convert-to pdf original/1521759484.3458297-Doc1.docx --outdir ./converted
shows you what command the subprocess
will call. However, when I look inside the converted
folder I see nothing. It's completely empty. The weird part, however, is when I bash into the docker container and run the SAME EXACT thing the file get's converted and put into the folder. Like so
root@4b9da6f71226:/data/web/fileshiffty/api# python3
Python 3.6.4 (default, Mar 14 2018, 17:49:05)
[GCC 4.9.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import subprocess
>>> subprocess.call('libreoffice --headless --convert-to pdf original/1521759484.3458297-Doc1.docx --outdir ./converted', shell=True)
convert /data/web/fileshiffty/api/original/1521759484.3458297-Doc1.docx -> /data/web/fileshiffty/api/converted/1521759484.3458297-Doc1.pdf using writer_pdf_Export
0
Why is it when I bash in and execute the subprocess it works but not from file. Could somebody please help me?
Edit. It seems that the subprocess command just doesn't seem to be getting executed. I changed the code to the following to find out what happens after the subprocess command and even used absolute paths like so:
def convert_office_to_pdf(original_file):
ws = websocket.WebSocket()
ws.connect('ws://web:8000/ws/converter/public/')
pure_file_name = os.path.splitext(os.path.basename(original_file))[0]
ws.send(json.dumps({
'message': '{}.pdf'.format(pure_file_name),
'progress': 50}))
print(os.getcwd())
print('libreoffice --headless --convert-to pdf original/{} --outdir ./converted'.format(original_file))
command = ['libreoffice', '--headless', '--convert-to', 'pdf', '{}/original/{}'.format(os.getcwd(), original_file), '--outdir', '{}/converted'.format(os.getcwd())]
process = subprocess.Popen(command, stdout=subprocess.PIPE)
out, err = process.communicate()
print(out)
print(err)
print('------------------------------------------------')
ws.send(json.dumps({
'message': '{}.pdf'.format(pure_file_name),
'progress': 75}))
upload_file_to_s3(pure_file_name, 'pdf', ws)
and I get the following output
[2018-03-22 23:44:54,668: DEBUG/MainProcess] Task accepted: api.tasks.convert_office_to_pdf[721ed2db-6a74-4fd2-9484-0fca14df7c01] pid:9
web_1 | [2018/03/22 23:44:54] WebSocket HANDSHAKING /ws/converter/public/ [172.17.0.2:60898]
web_1 | [2018/03/22 23:44:54] WebSocket CONNECT /ws/converter/public/ [172.17.0.2:60898]
worker_1 | [2018-03-22 23:44:54,696: WARNING/ForkPoolWorker-2] /data/web/fileshiffty
worker_1 | [2018-03-22 23:44:54,696: WARNING/ForkPoolWorker-2] libreoffice --headless --convert-to pdf original/1521762293.8511283-Doc1.docx --outdir ./converted
web_1 | {"message": "1521762293.8511283-Doc1.pdf", "progress": 50}
worker_1 | [2018-03-22 23:44:55,283: WARNING/ForkPoolWorker-2] b''
worker_1 | [2018-03-22 23:44:55,283: WARNING/ForkPoolWorker-2] None
worker_1 | [2018-03-22 23:44:55,283: WARNING/ForkPoolWorker-2] ------------------------------------------------
web_1 | {"message": "1521762293.8511283-Doc1.pdf", "progress": 75}
print(out)
just prints a blank byte and print(err)
which just prints None.
Edit 2 - This is the docker-compose file
web:
restart: always
tty: true
build: ./web/
working_dir: /data/web/fileshiffty
expose:
- "8000"
ports:
- "8000:8000"
links:
- postgres:postgres
- redis:redis
env_file: env
volumes:
- ./web:/data/web
command: bash -c "python3 manage.py runserver 0.0.0.0:8000"
# command: /usr/bin/gunicorn fileshiffty.wsgi:application -w 2 -b :8000
nginx:
restart: always
build: ./nginx/
ports:
- "80:80"
volumes_from:
- web
links:
- web:web
postgres:
restart: always
image: postgres:latest
volumes_from:
- data
volumes:
- ./postgres/docker-entrypoint-initdb.d:/docker-entrypoint-initdb.d
- ./backups/postgresql:/backup
env_file:
- env
expose:
- "5432"
redis:
restart: always
image: redis:latest
expose:
- "6379"
worker:
build: ./web/
working_dir: /data/web/fileshiffty
command: bash -c "celery -A fileshiffty worker --loglevel=DEBUG"
volumes:
- ./web:/data/web
links:
- postgres:postgres
- redis:redis
- web:web
data:
restart: always
image: alpine
volumes:
- /var/lib/postgresql
command: "true"