3

Following this guide I have an (Alpine) Docker Image running on AWS Lambda.

The image contains an app.py which is a simple .docx -> .pdf document converter. At the core is the following code, which works in the Docker Container on my local dev box, but raises subprocess.CalledProcessError on an actual Lambda deployment:

def handler(event, context):
    src_filename = event['filename']

    filename_body, _ = os.path.splitext(src_filename)

    src_filepath = '/tmp/test-template.docx'
    shutil.copyfile('/home/app/test-template.docx', src_filepath)  # for testing
    print( subprocess.check_output(['ls', '-l', '/tmp'] ) )
    # ^ -rw-rw-r--  20974 bytes  test-template.docx

    LIBRE_BINARY = '/usr/bin/soffice'
    print( subprocess.check_output(['ls', '-l', LIBRE_BINARY] ) )
    # ^ lrwxrwxrwx  /usr/bin/soffice -> /usr/lib/libreoffice/program/soffice

    MAX_TRIES = 3
    success = False

    print(f'Processing file: {src_filepath} with LibreOffice')
    for kTry in range(MAX_TRIES):
        print(f'Conversion Attempt #{kTry}')
        try:
            # https://stackoverflow.com/questions/4256107/running-bash-commands-in-python
            result = subprocess.run(
                [
                    LIBRE_BINARY,
                        '--headless',
                        '--invisible',
                        '--nodefault',
                        '--nofirststartwizard',
                        '--nolockcheck',
                        '--nologo',
                        '--norestore',
                        '--convert-to', 'pdf:writer_pdf_Export',
                        '--outdir', TMP_FOLDER,
                        src_filepath
                ],
                stdout=subprocess.PIPE,
                stderr=subprocess.PIPE,
                shell=False,
                check=True,
                text=True
            )

        except subprocess.CalledProcessError as e:
            raise RuntimeError(f"\tGot exit code {e.returncode}. Msg: {e.output}") from e
            continue

Response string is:

[ERROR] RuntimeError:   Got exit code 77. Msg: 
Traceback (most recent call last):
  File "/home/app/app.py", line 82, in handler
    raise RuntimeError(f"\tGot exit code {e.returncode}. Msg: {e.output}") from e

How is it possible that this succeeds on my local machine but fails on AWS?

It is the same container image executing. It is entirely self-contained. The problem is definitely coming from this subprocess.run command.

Here is my aws lambda create-function:

    aws lambda create-function  \
        --function-name $AWS_LAMBDAFUNC_NAME \
        --role $role_arn \
        --code ImageUri=$full_url \
        --package-type Image \
        --memory-size 8192 \
        --timeout 300 \
        --publish

I've used a large memory size and a large timeout.

I have read that writing to the file system outside of my /home/app folder and outside of /tmp might be problematic. So I am careful to use no such writes.

So what could be the problem?

It works from BASH

If I perform this processing in my entry.sh it works:

#!/bin/sh

/usr/bin/soffice \
    --headless \
    --invisible \
    --nodefault \
    --nofirststartwizard \
    --nolockcheck \
    --nologo \
    --norestore \
    --convert-to pdf:writer_pdf_Export \
    --outdir /tmp \
    /home/app/test-template.docx \
        &> /home/app/output_and_error_file

ls /tmp >> /home/app/output_and_error_file

exec python -m awslambdaric $1

output_and_error_file:

{"response": "convert /home/app/test-template.docx -> /tmp/test-template.pdf using filter : writer_pdf_Export
hsperfdata_root
test-template.pdf"}

So it must be that something about subprocess is grating against the Lambda runtime.

Test: Using os.system

os.system( 
    f'export HOME=/home/app && {LIBRE_BINARY}' \
    f'   --headless --invisible --nodefault --nofirststartwizard' \
    f'   --nolockcheck --nologo --norestore' \
    f'   --convert-to pdf:writer_pdf_Export' \
    f'   --outdir {TMP_FOLDER}' \
    f'   {src_filepath}' 
    )

This produces a more descriptive error:

START RequestId: f2c18863-977e-46e4-a138-c1db80759406 Version: $LATEST
Executing 'app.handler' in function directory '/home/app'
b'total 24\n-rw-rw-r--    1 sbx_user 990          20974 Jan  8 11:01 test-template.docx\n'
/usr/bin/soffice
b'lrwxrwxrwx    1 root     root            36 Jan  8 03:56 /usr/bin/soffice -> /usr/lib/libreoffice/program/soffice\n'
Processing file: /tmp/test-template.docx with LibreOffice
Conversion Attempt #0
javaldx failed!
Warning: failed to read path from javaldx
LibreOffice 6.4 - Fatal Error: The application cannot be started. 
User installation could not be completed. 
Unknown error with saving to S3: <class 'FileNotFoundError'>
END RequestId: f2c18863-977e-46e4-a138-c1db80759406
REPORT RequestId: f2c18863-977e-46e4-a138-c1db80759406  Duration: 2698.32 ms    Billed Duration: 5585 ms    Memory Size: 8192 MB    Max Memory Used: 175 MB Init Duration: 2885.69 ms   

soffice --version works

            result = subprocess.run( 
                [ LIBRE_BINARY, '--version' ],
                stdout=subprocess.PIPE,
                stderr=subprocess.PIPE,
                shell=False,
                check=True,
                text=True
            )

This works fine!

P i
  • 29,020
  • 36
  • 159
  • 267

1 Answers1

1

It's happening because libreoffice needs to create a dir called .cache/dconf in your user's home directory.

But on AWS lambdas, the user's home dir is read-only, so libreoffice fails with Fatal Error: The application cannot be started.

On lambdas, you can only write to temporary directories.

So, the solution is to set a temporary dir as your home directory while you call subprocess.run(...).

import subprocess
import tempfile

temp_dir = tempfile.TemporaryDirectory()
temp_dir_path = temp_dir.name

subprocess.run(
    f"soffice --headless --convert-to pdf {temp_dir_path}/input.xlsx --outdir {temp_dir_path}",
    shell=True,
    check=True,
    # libreoffice needs to create a dir called .cache/dconf in the HOME dir.
    # So HOME  must be writable. But on aws lambda, the default HOME is read-only.
    env={"HOME": temp_dir_path},
)
Lucidity
  • 1,299
  • 17
  • 19