9

I'm literally going crazy and pulling my hair out because I can't seem to solve this particular problem.

So here's the problem: I have two containers: Django and celery. The user uploads a word document and the celery worker converts that word document to pdf and uploads to a s3 bucket. I'm using libreoffice --headless to convert it. So a user sends the file to an API endpoints and saves the word document in a folder called original and celery calls convert_office_to_pdf.delay which needs to convert the file and put it into another folder converted. Everything is working as intended apart from the celery function. This is how the code looks:

import subprocess    
def convert_office_to_pdf(original_file):
    ws = websocket.WebSocket()
        ws.connect('ws://web:8000/ws/converter/public/')
    #how the command will look like
        print('libreoffice --headless --convert-to pdf original/{} --outdir ./converted'.format(original_file))
        subprocess.call('libreoffice --headless --convert-to pdf original/{} --outdir ./converted'.format(original_file), shell=True)
     ws.send(json.dumps({
            'message': '{}.pdf'.format(pure_file_name), 
            'progress': 75}))
        upload_file_to_s3(pure_file_name, 'pdf', ws)

However, the function get's executed and nothing happens. This is output from docker-compose

web_1       | [2018/03/22 22:57:52] HTTP GET /converter/ 200 [0.06, 172.17.0.1:32788]
web_1       | [2018/03/22 22:57:52] HTTP GET /static/css/normalize.css 304 [0.02, 172.17.0.1:32788]
web_1       | [2018/03/22 22:57:52] WebSocket HANDSHAKING /ws/converter/public/ [172.17.0.1:32798]
web_1       | [2018/03/22 22:57:52] WebSocket CONNECT /ws/converter/public/ [172.17.0.1:32798]
fileshiffty_data_1 exited with code 0
worker_1    | [2018-03-22 22:58:04,413: INFO/MainProcess] Received task: api.tasks.convert_office_to_pdf[287805aa-3c9c-4212-92d4-cac5872076f2]  
worker_1    | [2018-03-22 22:58:04,414: DEBUG/MainProcess] TaskPool: Apply <function _fast_trace_task at 0x7fb72d567e18> (args:('api.tasks.convert_office_to_pdf', '287805aa-3c9c-4212-92d4-cac5872076f2', {'lang': 'py', 'task': 'api.tasks.convert_office_to_pdf', 'id': '287805aa-3c9c-4212-92d4-cac5872076f2', 'eta': None, 'expires': None, 'group': None, 'retries': 0, 'timelimit': [None, None], 'root_id': '287805aa-3c9c-4212-92d4-cac5872076f2', 'parent_id': None, 'argsrepr': "('1521759484.3458297-Doc1.docx',)", 'kwargsrepr': '{}', 'origin': 'gen8@a478d8966021', 'reply_to': 'adf32365-ef93-327e-842f-7eff10fda37a', 'correlation_id': '287805aa-3c9c-4212-92d4-cac5872076f2', 'delivery_info': {'exchange': '', 'routing_key': 'celery', 'priority': 0, 'redelivered': None}}, b'[["1521759484.3458297-Doc1.docx"], {}, {"callbacks": null, "errbacks": null, "chain": null, "chord": null}]', 'application/json', 'utf-8') kwargs:{})
web_1       | [2018/03/22 22:58:04] HTTP PUT /api/v1/fileupload/word/pdf/ 200 [0.07, 172.17.0.1:32788]
worker_1    | [2018-03-22 22:58:04,417: DEBUG/MainProcess] Task accepted: api.tasks.convert_office_to_pdf[287805aa-3c9c-4212-92d4-cac5872076f2] pid:9
web_1       | [2018/03/22 22:58:04] WebSocket HANDSHAKING /ws/converter/public/ [172.17.0.2:58928]
web_1       | [2018/03/22 22:58:04] WebSocket CONNECT /ws/converter/public/ [172.17.0.2:58928]
worker_1    | [2018-03-22 22:58:04,426: WARNING/ForkPoolWorker-2] /data/web/fileshiffty
worker_1    | [2018-03-22 22:58:04,427: WARNING/ForkPoolWorker-2] libreoffice --headless --convert-to pdf original/1521759484.3458297-Doc1.docx --outdir ./converted
web_1       | {"message": "1521759484.3458297-Doc1.pdf", "progress": 50}
web_1       | {"message": "1521759484.3458297-Doc1.pdf", "progress": 75}

When I upload the file I can confirm that the file is added to the original folder and the log entry worker_1 | [2018-03-22 22:58:04,427: WARNING/ForkPoolWorker-2] libreoffice --headless --convert-to pdf original/1521759484.3458297-Doc1.docx --outdir ./converted shows you what command the subprocess will call. However, when I look inside the converted folder I see nothing. It's completely empty. The weird part, however, is when I bash into the docker container and run the SAME EXACT thing the file get's converted and put into the folder. Like so

root@4b9da6f71226:/data/web/fileshiffty/api# python3
Python 3.6.4 (default, Mar 14 2018, 17:49:05) 
[GCC 4.9.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import subprocess
>>> subprocess.call('libreoffice --headless --convert-to pdf original/1521759484.3458297-Doc1.docx --outdir ./converted', shell=True)
convert /data/web/fileshiffty/api/original/1521759484.3458297-Doc1.docx -> /data/web/fileshiffty/api/converted/1521759484.3458297-Doc1.pdf using writer_pdf_Export
0

Why is it when I bash in and execute the subprocess it works but not from file. Could somebody please help me?

Edit. It seems that the subprocess command just doesn't seem to be getting executed. I changed the code to the following to find out what happens after the subprocess command and even used absolute paths like so:

def convert_office_to_pdf(original_file):
    ws = websocket.WebSocket()
    ws.connect('ws://web:8000/ws/converter/public/')
    pure_file_name = os.path.splitext(os.path.basename(original_file))[0]
    ws.send(json.dumps({
        'message': '{}.pdf'.format(pure_file_name), 
        'progress': 50}))
    print(os.getcwd())
    print('libreoffice --headless --convert-to pdf original/{} --outdir ./converted'.format(original_file))
    command = ['libreoffice', '--headless', '--convert-to', 'pdf', '{}/original/{}'.format(os.getcwd(), original_file), '--outdir', '{}/converted'.format(os.getcwd())]
    process = subprocess.Popen(command, stdout=subprocess.PIPE)
    out, err = process.communicate()
    print(out)
    print(err)
    print('------------------------------------------------')
    ws.send(json.dumps({
        'message': '{}.pdf'.format(pure_file_name), 
        'progress': 75}))
    upload_file_to_s3(pure_file_name, 'pdf', ws)

and I get the following output

 [2018-03-22 23:44:54,668: DEBUG/MainProcess] Task accepted: api.tasks.convert_office_to_pdf[721ed2db-6a74-4fd2-9484-0fca14df7c01] pid:9
web_1       | [2018/03/22 23:44:54] WebSocket HANDSHAKING /ws/converter/public/ [172.17.0.2:60898]
web_1       | [2018/03/22 23:44:54] WebSocket CONNECT /ws/converter/public/ [172.17.0.2:60898]
worker_1    | [2018-03-22 23:44:54,696: WARNING/ForkPoolWorker-2] /data/web/fileshiffty
worker_1    | [2018-03-22 23:44:54,696: WARNING/ForkPoolWorker-2] libreoffice --headless --convert-to pdf original/1521762293.8511283-Doc1.docx --outdir ./converted
web_1       | {"message": "1521762293.8511283-Doc1.pdf", "progress": 50}
worker_1    | [2018-03-22 23:44:55,283: WARNING/ForkPoolWorker-2] b''
worker_1    | [2018-03-22 23:44:55,283: WARNING/ForkPoolWorker-2] None
worker_1    | [2018-03-22 23:44:55,283: WARNING/ForkPoolWorker-2] ------------------------------------------------
web_1       | {"message": "1521762293.8511283-Doc1.pdf", "progress": 75}

print(out) just prints a blank byte and print(err) which just prints None.

Edit 2 - This is the docker-compose file

web:
  restart: always
  tty: true
  build: ./web/
  working_dir: /data/web/fileshiffty
  expose:
    - "8000"
  ports:
    - "8000:8000"
  links:
    - postgres:postgres
    - redis:redis
  env_file: env
  volumes:
    - ./web:/data/web
  command: bash -c "python3 manage.py runserver 0.0.0.0:8000"
  # command: /usr/bin/gunicorn fileshiffty.wsgi:application -w 2 -b :8000
nginx:
  restart: always
  build: ./nginx/
  ports:
    - "80:80"
  volumes_from:
    - web
  links:
    - web:web
postgres:
  restart: always
 image: postgres:latest
  volumes_from:
    - data
  volumes:
    - ./postgres/docker-entrypoint-initdb.d:/docker-entrypoint-initdb.d
    - ./backups/postgresql:/backup
  env_file:
    - env
  expose:
    - "5432"
redis:
  restart: always
  image: redis:latest
  expose:
    - "6379"
worker:
    build: ./web/
    working_dir: /data/web/fileshiffty
    command: bash -c "celery -A fileshiffty worker --loglevel=DEBUG"
    volumes:
      - ./web:/data/web
    links:
      - postgres:postgres
      - redis:redis
      - web:web
data:
  restart: always
  image: alpine
  volumes:
    - /var/lib/postgresql
  command: "true"
alpha787
  • 171
  • 4
  • 10

2 Answers2

1

Check if the version of Python you have developed your code in, and the version with which you are building your container is the same. I faced an exactly similar issue. I was using subprocess.call() to execute something on command line in my code. My code ran perfectly well on my local machine but it failed at subprocess.call() when trying to run inside the docker container. It would although, strangely, run inside the docker if I explicitly wrote subprocess.call() in the interactive Python shell. I even tried experimenting with os.system(). Same issue.

Finally resolved as soon as I made the python versions same (initially they were 3.7.3 for development version and 3.5 for docker container). I hope the same works for you!

Also if someone else could add more technical insight to this dirty fix I suggested, it'll be great.

0

A few possible reasons:

  1. Does this happen only when multiple users call your web API, invoking libreoffice? If so, you need to make sure that every concurrent libreoffice process has its own independent user installation directory. You can set a custom one using libreoffice -env:UserInstallation=file:///tmp/test.

  2. If your model is that you start a libreoffice process ahead of time, so later libreoffice processes just forward the request to the already started up worker, what version of LibreOffice do you use? For example, the 6.1 line had a bug where we did not wait for conversion result, see https://gerrit.libreoffice.org/#/c/66168/ for the fix. (The about dialog has a version string and an exact git hash. So 6.1.5 has this fix already, but not 6.1.4.)

Miklos Vajna
  • 146
  • 2