2

I tried installing Airflow with this command, and I got an error message.

#pip3 install apache-airflow[postgres,gcp,aws,celery]

I followed the installation instructions exactly, what went wrong here?

https://airflow.apache.org/docs/stable/installation.html

  

... a long list of successful feedback, and then this:

building 'psutil._psutil_linux' extension
creating build/temp.linux-x86_64-3.6
creating build/temp.linux-x86_64-3.6/psutil

gcc -pthread -Wno-unused-result -Wsign-compare -DDYNAMIC_ANNOTATIONS_ENABLED=1 -DNDEBUG -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -D_GNU_SOURCE -fPIC -fwrapv -fPIC -DPSUTIL_POSIX=1 -DPSUTIL_VERSION=567 -DPSUTIL_LINUX=1 -I/usr/include/python3.6m -c psutil/_psutil_common.c -o build/temp.linux-x86_64-3.6/psutil/_psutil_common.o

    psutil/_psutil_common.c:9:10: fatal error: Python.h: No such file or directory
    #include <Python.h>
                  ^~~~~~~~~~
    compilation terminated.
    error: command 'gcc' failed with exit status 1

    ----------------------------------------
    Command "/usr/bin/python3.6 -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-1jwpvsnq/psutil/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-ni_brusw-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-1jwpvsnq/psutil/

The installation instructions for Airflow using Postgres as a backend are incorrect.

The Apache Airflow documentation is not clear on how to get Airflow installed and configured with Postgres as a backend.


I installed with:

pip install apache-airflow[postgres]

Now what?

The installation instructions have a link to "Initializing a Database Backend"

I click there. It has this line:

If you decide to use Postgres, we recommend using the psycopg2 driver and specifying it in your SqlAlchemy connection string.

OK, so I `pip install psychopg2', is this enough to satisfy the recommendation of using the 'psycopg2' driver?

And so now where is my 'SqlAlchemy connection string'? Where is the SqlAlchemy string located? what file? I don't see any files anywhere.


Now it says:

Also note that since SqlAlchemy does not expose a way to target a specific schema in the Postgres connection URI, you may want to set a default schema for your role with a command similar to ALTER ROLE username SET search_path = airflow, foobar;

What does this mean?

Does this mean I am to create a role/username for Airflow/SqlAlchemy to use?

And if so what would be a good username? ('airflow'?)


And where do I set the Postgres connection URI and exactly what would the syntax be?

The installation instructions seem to be suggestions in using the word "may want to", and then gives no specifics on how to follow those suggestions.


I don't think install instructions should use wordings that indicate suggestions, I simply want step by step instructions on how to get this working.


It then says:

Once you’ve setup your database to host Airflow

How did I setup my database to host Airflow? What did I do there that achieved that goal?


Then it says:

you’ll need to alter the SqlAlchemy connection string located in your configuration file $AIRFLOW_HOME/airflow.cfg

I don't see an airflow.cfg anywhere - it's no where to be found.


It then says:

You should then also change the “executor” setting to use “LocalExecutor”, an executor that can parallelize task instances locally.

What does that mean?


At this point it says:

# initialize the database
airflow initdb

I don't think I should run that at this point because I have no idea what the previous instructions were talking about.


If what should be very simple documentation is this bad, I can't imagine what the code looks like.

halfer
  • 19,824
  • 17
  • 99
  • 186
user10664542
  • 1,106
  • 1
  • 23
  • 43
  • 1
    Maybe best to open an issue on their github page?! Try to formulate your question friendlier though :) – Cleb Feb 12 '20 at 22:03
  • 1
    Will do, thank you, I'm two days into this and things are still not installed correctly and working because of the poor documentation. We are instead leaning towards a commercial/paid workflow tool, we believe we would save money going that route. – user10664542 Feb 13 '20 at 06:38
  • What is your OS? – sophros Feb 13 '20 at 06:47

1 Answers1

1

This part of your error message suggests that you are missing python headers (.h) files:

gcc -pthread -Wno-unused-result -Wsign-compare -DDYNAMIC_ANNOTATIONS_ENABLED=1 -DNDEBUG -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -D_GNU_SOURCE -fPIC -fwrapv -fPIC -DPSUTIL_POSIX=1 -DPSUTIL_VERSION=567 -DPSUTIL_LINUX=1 -I/usr/include/python3.6m -c psutil/_psutil_common.c -o build/temp.linux-x86_64-3.6/psutil/_psutil_common.o

    psutil/_psutil_common.c:9:10: fatal error: Python.h: No such file or directory
    #include <Python.h>
                  ^~~~~~~~~~
    compilation terminated.
    error: command 'gcc' failed with exit status 1

You will find information about installing the headers here: I have Python on my Ubuntu system, but gcc can't find Python.h

sophros
  • 14,672
  • 11
  • 46
  • 75
  • 1
    Hi @user10664542! Did it help you resolve the issue? If yes, then as a token of appreciation please mark the answer with the tick beside it. – sophros Feb 16 '20 at 11:42