I am using Docker combined with virtualenv to run a project for a client, but getting the error ModuleNotFound for sklearn.
In my Pipfile I have added the numpy dependency
numpy = "==1.21.6"
The error is thrown from the following line
np.load(PATH_TO_NPY_FILE, allow_pickle=True)
with the following stack trace:
development_1 | File "/root/.local/share/virtualenvs/code-_Py8Si6I/lib/python3.7/site-packages/numpy/lib/npyio.py", line 441, in load
development_1 | pickle_kwargs=pickle_kwargs)
development_1 | File "/root/.local/share/virtualenvs/code-_Py8Si6I/lib/python3.7/site-packages/numpy/lib/format.py", line 748, in read_array
development_1 | array = pickle.load(fp, **pickle_kwargs)
development_1 | ModuleNotFoundError: No module named 'sklearn'
I find this strange, because sklearn
should be installed as part of the numpy dependency tree, right?
Still I tried the suggestions I found in other posts, like adding the following command explicitly to my Dockerfile
python -m pip install scikit-learn scipy matplotlib
However, the error still persists.
For completeness, I'll provide some extra info below, although the key question remains why installing numpy does not imply its sub dependencies to be in place.
Project structure
The project is sort of a bridge between SQS on one hand and the logic of the client on the other. The code from which the error is thrown comes from a git submodule and the Pipfile is added on the top-level repo. The submodule does not contain a Pipfile. The submodules folder has an __init__.py
file because it contains functions that I want to use in my src code.
In the tree below, my code is in main.py
and the error throwing code is in submodules/module2/bar.py
.
|- src/
| |- main.py
|
|- submodules/
| |- module1
| | |- foo.py
| | |- setup.py
| |
| |- module2
| | |- bar.py
| |
| |- __init__.py
|
|- .gitmodules
|- Pipfile
|- Dockerfile
Dockerfile contents
Note that at this point, it is a bit of an aggregate of solutions I took from the other post on the matter. That's why both pip install scikit-learn
and apt-get install python3-sklearn
are currently included. Will prune later when I finally have fixed this issue.
FROM python:3.7
WORKDIR code/
COPY Pipfile .
COPY submodules/ submodules/
RUN pip install pipenv && \
pipenv install --deploy && \
python -m pip install scikit-learn scipy matplotlib && \
apt-get update && \
apt-get install -y locales ffmpeg libsm6 libxext6 libxrender-dev python3-sklearn && \
sed -i -e 's/# nl_BE.UTF-8 UTF-8/nl_BE.UTF-8 UTF-8/' /etc/locale.gen && \
dpkg-reconfigure --frontend=noninteractive locales
ENV LANG nl_BE.UTF-8
ENV LC_ALL nl_BE.UTF-8
COPY .env .
COPY src/ .
COPY data/ data
CMD [ "pipenv", "run", "python", "main.py" ]x
Pipfile contents
[[source]]
url = "https://pypi.org/simple"
verify_ssl = true
name = "pypi"
[packages]
python-dotenv = "*"
boto3 = "*"
pySqsListener = "*"
xpress = "==9.0.5"
module1 = {path = "./submodules/module1"}
pandas = "==1.3.4"
numpy = "==1.21.6"
[dev-packages]
[requires]
python_version = "3.7"