1

I'm trying to install apache-airflow-providers-apache-hdfs library in my Airflow-Docker 2.5.3.

I've installed all the necessary Kerberos' libs, and I got the following error:

#0 5.236 Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /home/airflow/.local/lib/python3.10/site-packages (from aiohttp->apache-airflow-providers-http->apache-airflow>=2.3.0->apache-airflow-providers-apache-hdfs==3.2.1->-r /requirements.txt (line 2)) (4.0.2)
#0 5.311 Collecting krb5>=0.3.0
#0 5.335   Downloading krb5-0.5.0.tar.gz (220 kB)
#0 5.353      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 221.0/221.0 kB 13.2 MB/s eta 0:00:00
#0 5.394   Installing build dependencies: started
#0 9.611   Installing build dependencies: finished with status 'done'
#0 9.618   Getting requirements to build wheel: started
#0 9.945   Getting requirements to build wheel: finished with status 'error'
#0 9.951   error: subprocess-exited-with-error
#0 9.951   
#0 9.951   × Getting requirements to build wheel did not run successfully.
#0 9.951   │ exit code: 1
#0 9.951   ╰─> [22 lines of output]
#0 9.951       /bin/sh: 1: krb5-config: Permission denied
#0 9.951       Using krb5-config at 'krb5-config'
#0 9.951       Traceback (most recent call last):
#0 9.951         File "/home/airflow/.local/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
#0 9.951           main()
... (part of the log is removed) ...
#0 9.951         File "/usr/local/lib/python3.10/subprocess.py", line 526, in run
#0 9.951           raise CalledProcessError(retcode, process.args,
#0 9.951       subprocess.CalledProcessError: Command '('krb5-config --cflags krb5',)' returned non-zero exit status 127.
#0 9.951       [end of output]
#0 9.951   
#0 9.951   note: This error originates from a subprocess, and is likely not a problem with pip.
#0 9.953 error: subprocess-exited-with-error
#0 9.953 
#0 9.953 × Getting requirements to build wheel did not run successfully.
#0 9.953 │ exit code: 1
#0 9.953 ╰─> See above for output.

In my Dockerfile, I've install all the Kerberos library as root. When I run the pip install -r requirements.txt, I use the airflow user.

My Dockerfile:

FROM apache/airflow:2.5.3-python3.10

ENV DEBIAN_FRONTEND=noninteractive
ENV TERM linux

USER root

RUN set -ex \
    && buildDeps=' \
        krb5-config \
        krb5-user \
        libpam-krb5 \
        libkrb5-dev \
    ' \
    && apt-get -qq update \
    && apt-get -yqq install --no-install-recommends $buildDeps \
    && apt-get purge --auto-remove -yqq $buildDeps \
    && apt-get -yqq clean


USER airflow

# Copy requirement.txt into Docker
COPY requirements.txt /

## the code failed on the following line ##
RUN pip install --no-cache-dir -r /requirements.txt

My requirements.txt file:

apache-airflow-providers-apache-hdfs==3.2.1

I'm not sure what am I missing. Should I manually change the file mode of krb5-config file? I can't file the krb5-config file inside docker.

Thanks for your help and guidance.

Donny
  • 31
  • 4

1 Answers1

2

A co-worker helped me with the configuration. She found out from this Stackoverflow thread, that we need to include the heimdal-dev library.

and this is the final Dockerfile:

FROM apache/airflow:2.5.3-python3.10
ADD requirements.txt .
USER root
RUN apt-get update \
  && apt-get install -y --no-install-recommends \
         gcc \
         heimdal-dev \
  && apt-get autoremove -yqq --purge \
  && apt-get clean \
  && rm -rf /var/lib/apt/lists/*
USER airflow
RUN pip install -r requirements.txt
RUN pip uninstall -y argparse
Donny
  • 31
  • 4