1

I need to install pyspark. It has a dependency on pypandoc. So I first do pip install pypandoc and then pip install pyspark and everything looks fine. However, based on some requirements I need to install my dependencies using a requirements.txt file. So I put both pypandoc and pyspark in the requirements.txt file and then I do pip install -r requirements.txt (pypandoc comes first in the file followed by pyspark), however this time the installation file with the following error

 Complete output from command python setup.py egg_info:
Could not import pypandoc - required to package PySpark
Download error on https://pypi.org/simple/pypandoc/: [Errno 97] Address family not supported by protocol -- Some packages may not be found!
Couldn't find index page for 'pypandoc' (maybe misspelled?)
Download error on https://pypi.org/simple/: [Errno 97] Address family not supported by protocol -- Some packages may not be found!
No local packages or working download links found for pypandoc
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/tmp/pip-install-6vmbjchu/pyspark/setup.py", line 224, in <module>
    'Programming Language :: Python :: Implementation :: PyPy']
  File "/usr/local/lib/python3.6/site-packages/setuptools/__init__.py", line 144, in setup
    _install_setup_requires(attrs)
  File "/usr/local/lib/python3.6/site-packages/setuptools/__init__.py", line 139, in _install_setup_requires
    dist.fetch_build_eggs(dist.setup_requires)
  File "/usr/local/lib/python3.6/site-packages/setuptools/dist.py", line 724, in fetch_build_eggs
    replace_conflicting=True,
  File "/usr/local/lib/python3.6/site-packages/pkg_resources/__init__.py", line 782, in resolve
    replace_conflicting=replace_conflicting
  File "/usr/local/lib/python3.6/site-packages/pkg_resources/__init__.py", line 1065, in best_match
    return self.obtain(req, installer)
  File "/usr/local/lib/python3.6/site-packages/pkg_resources/__init__.py", line 1077, in obtain
    return installer(requirement)
  File "/usr/local/lib/python3.6/site-packages/setuptools/dist.py", line 791, in fetch_build_egg
    return cmd.easy_install(req)
  File "/usr/local/lib/python3.6/site-packages/setuptools/command/easy_install.py", line 673, in easy_install
    raise DistutilsError(msg)
distutils.errors.DistutilsError: Could not find suitable distribution for Requirement.parse('pypandoc')

So it looks like when I do it in this way, the pypandoc is not properly installed when it tries to install pyspark. How can I fix this issue?

HHH
  • 6,085
  • 20
  • 92
  • 164
  • It works fine for me when using requirement.txt. I tried it just now and I didn't assigned the specific version in the requirement.txt. Maybe it is related to your python env, you can create a new env to try it. – ulysses Mr Jun 19 '19 at 15:43
  • did you put both lib in your requirements.txt file? – HHH Jun 19 '19 at 15:48
  • seems answer maybe here: https://stackoverflow.com/questions/56652619/problem-in-building-a-docker-image-with-pyspark-lib/59949447#59949447 – Boris Azanov Jan 28 '20 at 13:08

0 Answers0