2

I have a working Airflow environment using Airflow version 1.9 that is running on an Amazon EC2-Instance. I need to upgrade to the latest version of Airflow which is 1.10. I have the option of either upgrading from version 1.9 or installing 1.10 freshly on a new server. Airflow version 1.10 is not listed on Pip so I'm installing it from Git via this command,

pip-3.6 install git+git://github.com/apache/incubator-airflow.git@v1-10-stable

This command successfully installs Airflow version 1.10. You can see that by running the command airflow version and viewing the output,

  ____________       _____________
 ____    |__( )_________  __/__  /________      __
____  /| |_  /__  ___/_  /_ __  /_  __ \_ | /| / /
___  ___ |  / _  /   _  __/ _  / / /_/ /_ |/ |/ /
 _/_/  |_/_/  /_/    /_/    /_/  \____/____/|__/
   v1.10.0

When I tried starting up the Airflow scheduler with airflow scheduler I get the following exception,

ModuleNotFoundError: No module named 'MySQLdb'
[2018-08-14 14:03:16,195] {celery_executor.py:112} ERROR - Error syncing the celery executor, ignoring it:
[2018-08-14 14:03:16,195] {celery_executor.py:113} ERROR - No module named 'MySQLdb'
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/airflow/executors/celery_executor.py", line 94, in sync
    state = task.state
  File "/usr/local/lib/python3.6/site-packages/celery/result.py", line 471, in state
    return self._get_task_meta()['status']
  File "/usr/local/lib/python3.6/site-packages/celery/result.py", line 410, in _get_task_meta
    return self._maybe_set_cache(self.backend.get_task_meta(self.id))
  File "/usr/local/lib/python3.6/site-packages/celery/backends/base.py", line 365, in get_task_meta
    meta = self._get_task_meta_for(task_id)
  File "/usr/local/lib/python3.6/site-packages/celery/backends/database/__init__.py", line 53, in _inner
    return fun(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/celery/backends/database/__init__.py", line 122, in _get_task_meta_for
    session = self.ResultSession()
  File "/usr/local/lib/python3.6/site-packages/celery/backends/database/__init__.py", line 99, in ResultSession
    **self.engine_options)
  File "/usr/local/lib/python3.6/site-packages/celery/backends/database/session.py", line 59, in session_factory
    engine, session = self.create_session(dburi, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/celery/backends/database/session.py", line 45, in create_session
    engine = self.get_engine(dburi, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/celery/backends/database/session.py", line 42, in get_engine
    return create_engine(dburi, poolclass=NullPool)
  File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/__init__.py", line 391, in create_engine
    return strategy.create(*args, **kwargs)
  File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/strategies.py", line 80, in create
    dbapi = dialect_cls.dbapi(**dbapi_args)
  File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/dialects/mysql/mysqldb.py", line 110, in dbapi
    return __import__('MySQLdb')
ModuleNotFoundError: No module named 'MySQLdb'
[2018-08-14 14:03:16,196] {celery_executor.py:112} ERROR - Error syncing the celery executor, ignoring it:
[2018-08-14 14:03:16,196] {celery_executor.py:113} ERROR - No module named 'MySQLdb'
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/airflow/executors/celery_executor.py", line 94, in sync
    state = task.state
  File "/usr/local/lib/python3.6/site-packages/celery/result.py", line 471, in state
    return self._get_task_meta()['status']
  File "/usr/local/lib/python3.6/site-packages/celery/result.py", line 410, in _get_task_meta
    return self._maybe_set_cache(self.backend.get_task_meta(self.id))
  File "/usr/local/lib/python3.6/site-packages/celery/backends/base.py", line 365, in get_task_meta
    meta = self._get_task_meta_for(task_id)
  File "/usr/local/lib/python3.6/site-packages/celery/backends/database/__init__.py", line 53, in _inner
    return fun(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/celery/backends/database/__init__.py", line 122, in _get_task_meta_for
    session = self.ResultSession()
  File "/usr/local/lib/python3.6/site-packages/celery/backends/database/__init__.py", line 99, in ResultSession
    **self.engine_options)
  File "/usr/local/lib/python3.6/site-packages/celery/backends/database/session.py", line 59, in session_factory
    engine, session = self.create_session(dburi, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/celery/backends/database/session.py", line 45, in create_session
    engine = self.get_engine(dburi, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/celery/backends/database/session.py", line 42, in get_engine
    return create_engine(dburi, poolclass=NullPool)
  File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/__init__.py", line 391, in create_engine
    return strategy.create(*args, **kwargs)
  File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/strategies.py", line 80, in create
    dbapi = dialect_cls.dbapi(**dbapi_args)
  File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/dialects/mysql/mysqldb.py", line 110, in dbapi
    return __import__('MySQLdb')
ModuleNotFoundError: No module named 'MySQLdb'
[2018-08-14 14:03:16,197] {celery_executor.py:112} ERROR - Error syncing the celery executor, ignoring it:
[2018-08-14 14:03:16,197] {celery_executor.py:113} ERROR - No module named 'MySQLdb'
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/airflow/executors/celery_executor.py", line 94, in sync
    state = task.state
  File "/usr/local/lib/python3.6/site-packages/celery/result.py", line 471, in state
    return self._get_task_meta()['status']
  File "/usr/local/lib/python3.6/site-packages/celery/result.py", line 410, in _get_task_meta
    return self._maybe_set_cache(self.backend.get_task_meta(self.id))
  File "/usr/local/lib/python3.6/site-packages/celery/backends/base.py", line 365, in get_task_meta
    meta = self._get_task_meta_for(task_id)
  File "/usr/local/lib/python3.6/site-packages/celery/backends/database/__init__.py", line 53, in _inner
    return fun(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/celery/backends/database/__init__.py", line 122, in _get_task_meta_for
    session = self.ResultSession()
  File "/usr/local/lib/python3.6/site-packages/celery/backends/database/__init__.py", line 99, in ResultSession
    **self.engine_options)
  File "/usr/local/lib/python3.6/site-packages/celery/backends/database/session.py", line 59, in session_factory
    engine, session = self.create_session(dburi, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/celery/backends/database/session.py", line 45, in create_session
    engine = self.get_engine(dburi, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/celery/backends/database/session.py", line 42, in get_engine
    return create_engine(dburi, poolclass=NullPool)
  File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/__init__.py", line 391, in create_engine
    return strategy.create(*args^C[2018-08-14 14:03:16,424] {jobs.py:1585} INFO - Exited execute loop
[2018-08-14 14:03:16,433] {jobs.py:1599} INFO - Terminating child PID: 13615

Here's what my lib folder has,

[/usr/local/lib/python3.6/site-packages]# cd /usr/local/lib64/python3.6/site-packages/sqlalchemy/
root@ip-1-2-3-4
[/usr/local/lib64/python3.6/site-packages/sqlalchemy]# ll
total 320
drwxr-xr-x  3 root root  4096 Aug 13 17:17 connectors
-rwxr-xr-x  1 root root 40456 Aug 13 17:17 cprocessors.cpython-36m-x86_64-linux-gnu.so
-rwxr-xr-x  1 root root 51408 Aug 13 17:17 cresultproxy.cpython-36m-x86_64-linux-gnu.so
-rwxr-xr-x  1 root root 21944 Aug 13 17:17 cutils.cpython-36m-x86_64-linux-gnu.so
drwxr-xr-x  3 root root  4096 Aug 13 17:17 databases
drwxr-xr-x 10 root root  4096 Aug 13 17:17 dialects
drwxr-xr-x  3 root root  4096 Aug 13 17:17 engine
drwxr-xr-x  3 root root  4096 Aug 13 17:17 event
-rwxr-xr-x  1 root root 49746 Mar  6 14:01 events.py
-rwxr-xr-x  1 root root 12030 Mar  6 14:01 exc.py
drwxr-xr-x  4 root root  4096 Aug 13 17:17 ext
-rwxr-xr-x  1 root root  2249 Mar  6 14:01 __init__.py
-rwxr-xr-x  1 root root  3093 Mar  6 14:01 inspection.py
-rwxr-xr-x  1 root root 10967 Mar  6 14:01 interfaces.py
-rwxr-xr-x  1 root root  6712 Mar  6 14:01 log.py
drwxr-xr-x  3 root root  4096 Aug 13 17:17 orm
-rwxr-xr-x  1 root root 49883 Mar  6 14:01 pool.py
-rwxr-xr-x  1 root root  5217 Mar  6 14:01 processors.py
drwxr-xr-x  2 root root  4096 Aug 13 17:17 __pycache__
-rwxr-xr-x  1 root root  1200 Mar  6 14:01 schema.py
drwxr-xr-x  3 root root  4096 Aug 13 17:17 sql
drwxr-xr-x  5 root root  4096 Aug 13 17:17 testing
-rwxr-xr-x  1 root root  1713 Mar  6 14:01 types.py
drwxr-xr-x  3 root root  4096 Aug 13 17:17 util
root@ip-1-2-3-4
[/usr/local/lib64/python3.6/site-packages/sqlalchemy]# pwd
/usr/local/lib64/python3.6/site-packages/sqlalchemy
root@ip-1-2-3-4
[/usr/local/lib64/python3.6/site-packages/sqlalchemy]# cd /usr/local/lib/python3.6/site-packages/sqlalchemy/
bash: cd: /usr/local/lib/python3.6/site-packages/sqlalchemy/: No such file or directory

I'm just confused why Airflow's installation didn't take care of all it's needed dependencies. Am I installing Airflow incorrectly? I really need to be on version 1.10 because version 1.9 has a major bug in it as discovered here and here.

Kyle Bridenstine
  • 6,055
  • 11
  • 62
  • 100
  • 1
    Doesn't seem like this question has anything to do with `amazon-web-services`, `amazon-ec2`, or likely even `pip` – erik258 Aug 15 '18 at 15:13
  • 1
    @DanFarrell thanks I think you're right. I've modified the post accordingly although I would like to keep pip in there for now. – Kyle Bridenstine Aug 15 '18 at 15:17
  • 1
    If mysql is just one of several possible databases ( seems likely ), then it isn't unusual for the individual database drivers to be omitted from the package. But you could search for / raise a github issue to be sure. – erik258 Aug 15 '18 at 15:22
  • 1
    For others who find this thread, the 1.10 release is not yet published on PyPI because it has not yet been officially released. Installing the 1.10 release candidates (RCs) or 1.10 stable branch in the meantime (like the pip install command in the question does) are the best way to test it ahead of the release. – Taylor D. Edmiston Aug 15 '18 at 16:08
  • @TaylorEdmiston I'm just curious, why hasn't 1.10 officially been released yet? Is it stable? – Kyle Bridenstine Aug 15 '18 at 16:13
  • 1
    There's a formal Apache release process that requires a cycle of testing (by everyone) and voting (by Airflow committers) to help reduce bugs and improve stability. Each time a critical bug or regression is found, it's fixed in a follow up RC. Once the final RC receives so many votes, it becomes the official release. Here's some more info from Apache: [Publishing Releases](http://www.apache.org/dev/release-publishing.html), [Release Policy](http://www.apache.org/legal/release-policy.html), [Incubation Policy - Releases](https://incubator.apache.org/policy/incubation.html#releases). – Taylor D. Edmiston Aug 15 '18 at 17:27
  • 1
    Airflow 1.10 release is published on PyPI now, and some properties have been rename, such as `celery_result_backend -> result_backend` ,more details see https://github.com/apache/incubator-airflow/blob/master/UPDATING.md#celery-config – fcce Aug 28 '18 at 08:23

1 Answers1

9

There are a number of install extras ("optional dependencies") one can provide when doing a fresh install. Airflow doesn't install them all by default because there are dozens and some require special dependencies like Mesos or Kubernetes.

https://airflow.readthedocs.io/en/stable/installation.html#extra-packages

Note that for 1.10.0-1.10.2 you now need to preface install commands or export this env var:

export SLUGIFY_USES_TEXT_UNIDECODE=yes

This is no longer required for 1.10.3 and up.

Once 1.10 is released you'll be able to install extras like this:

pip install apache-airflow[celery,devel,postgres]

When installing from git, the pip syntax for installing extras is a little more complicated:

pip install git+git://github.com/apache/incubator-airflow.git@v1-10-stable#egg=apache-airflow[celery,devel,postgres]

If you're trying to install Airflow with MySQL support, you can include the mysql extra:

pip install git+git://github.com/apache/incubator-airflow.git@v1-10-stable#egg=apache-airflow[mysql]

If you really do want to install all extras, you can use the all extra:

pip install git+git://github.com/apache/incubator-airflow.git@v1-10-stable#egg=apache-airflow[all]

Note: If you previously installed any extras for apache-airflow 1.9 on PyPI, you'd need to provide them again here when installing 1.10 from GitHub since pip doesn't associate the GitHub repo with the PyPI package.


Questions

  • Are you running Python 3.6.5?
  • Do you still get the same error if you include mysql extra on install?
Taylor D. Edmiston
  • 12,088
  • 6
  • 56
  • 76
  • we are using Python3.6. Let me try this out and see if I get the same error when including mysql. Thank you! – Kyle Bridenstine Aug 15 '18 at 16:09
  • getting errors when running the command for [all]: Collecting mysqlclient>=1.3.6 (from apache-airflow[all]) 100% Complete output from command python setup.py egg_info: sh: mysql_config: command not found ... EnvironmentError: mysql_config not found Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-MxuvAR/mysqlclient/ – Kyle Bridenstine Aug 15 '18 at 16:27
  • Fixed it by doing `sudo yum install -y python36 python36-devel python36-setuptools python36-pip.noarch gcc ` installing Airflow 1.10 now. – Kyle Bridenstine Aug 15 '18 at 17:28
  • Cool - glad to hear it was resolved. Feel free to send a PR adding to the docs back to Airflow if you think this could be made clearer for the next person. – Taylor D. Edmiston Aug 15 '18 at 17:32
  • The fix they put is wrong... https://github.com/apache/incubator-airflow/pull/2484/commits/4e60701285177206e84e2b6bd21c7796935e3c91 check out this post https://stackoverflow.com/a/36160103/3299397 the fix they put is giving us the hostname not the IP address. We have to use `socket.gethostbyname(socket.gethostname())` – Kyle Bridenstine Aug 15 '18 at 17:35
  • Sorry let me post that on the more relevant question here https://stackoverflow.com/questions/51365911/airflow-logs-brokenpipeexception/51790409#51790409 and here https://stackoverflow.com/questions/51775370/airflowexception-celery-command-failed-the-recorded-hostname-does-not-match-t – Kyle Bridenstine Aug 15 '18 at 17:37
  • v1.10.3 no longer requires `export SLUGIFY_USES_TEXT_UNIDECODE=yes` – trker May 22 '19 at 19:35