I am trying to build a generic tool to bootstrap an EMR cluster. Some of the jobs we run in PySpark on EMR require psycopg2. Others don't. For the ones that require psycopg2, we need to yum install postgresql-devel
. Otherwise, we don't. So I'm trying to detect if psycopg2 is a dependency.
However, every approach I've tried so far (using pip-23.0.1) results in this output from PostgresConfig when psycopg2 is a dependency:
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [25 lines of output]
running egg_info
creating /mnt/tmp/pip-pip-egg-info-qjiuibm3/psycopg2.egg-info
writing /mnt/tmp/pip-pip-egg-info-qjiuibm3/psycopg2.egg-info/PKG-INFO
writing dependency_links to /mnt/tmp/pip-pip-egg-info-qjiuibm3/psycopg2.egg-info/dependency_links.txt
writing top-level names to /mnt/tmp/pip-pip-egg-info-qjiuibm3/psycopg2.egg-info/top_level.txt
writing manifest file '/mnt/tmp/pip-pip-egg-info-qjiuibm3/psycopg2.egg-info/SOURCES.txt'
/usr/local/lib/python3.7/site-packages/setuptools/config/setupcfg.py:516: SetuptoolsDeprecationWarning: The license_file parameter is deprecated, use license_files instead.
warnings.warn(msg, warning_class)
Error: pg_config executable not found.
pg_config is required to build psycopg2 from source. Please add the directory
containing pg_config to the $PATH or specify the full executable path with the
option:
python setup.py build_ext --pg-config /path/to/pg_config build ...
or with the pg_config option in 'setup.cfg'.
If you prefer to avoid building psycopg2 from source, please install the PyPI
'psycopg2-binary' package instead.
For further information please check the 'doc/src/install.rst' file (also at
<https://www.psycopg.org/docs/install.html>).
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
This includes ideas from these posts including pip-compile
, pip download
and pip install --dry-run
.
Here are a few ways to reproduce the behavior I'm seeing. (Assume that requirements.in
contains the name of the package we're trying to install. In the simplest case, it can just contain "psycopg2".)
pip-compile requirements.in --output-file requirements.txt
# or
pip download --dest /tmp --requirement requirements.in
# or
pip install --dry-run --requirement requirements.in |
sed -n 's/^Would install //p' |
tr ' ' '\n' |
sed 's/\(.*\)-/\1==/g' > requirements.txt
(Following the above, I intended to either grep the resolved requirements.txt
, or search the download dir for psycopg2. But I don't get that far.)
As a last-ditch, I could try doing pip install --dry-run
, capture the stderr and parse it for the above message, but is there a more elegant way to tell if psycopg2
is a dependency (transitive or direct) without triggering PostgresConfig
?
A general solution for determining if a c extension would need to be compiled would also be helpful, if you know of one.