I have to run the python program in Redhat8. So I pull Redhat docker image and write a Dockerfile which is in the following:
FROM redhat/ubi8:latest
RUN echo "nameserver 9.9.9.9" >> /etc/resolv.conf && mkdir /home/spark && mkdir /home/spark/spark && mkdir /home/spark/ETL && mkdir /usr/lib/java && mkdir /usr/share/oracle
# set environment vars
ENV SPARK_HOME /home/spark/spark
ENV JAVA_HOME /usr/lib/java
# install packages
RUN \
echo "nameserver 9.9.9.9" >> /etc/resolv.conf && \
yum install -y rsync && yum install -y wget && yum install -y python3-pip && yum
install -y openssh-server && yum install -y openssh-clients && \
yum install -y unzip && yum install -y python38 && yum install -y nano
# create ssh keys
RUN \
echo "nameserver 9.9.9.9" >> /etc/resolv.conf && \
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa && \
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys && \
chmod 0600 ~/.ssh/authorized_keys
# copy ssh config
COPY ssh_config /root/.ssh/config
COPY spark-3.1.2-bin-hadoop3.2.tgz /home/
COPY jdk-8u25-linux-x64.tar.gz /home/
COPY instantclient-basic-linux.x64-19.8.0.0.0dbru.zip /home
COPY etl /home/ETL/
RUN \
tar -zxvf /home/spark-3.1.2-bin-hadoop3.2.tgz -C /home/spark && mv -v
/home/spark/spark-3.1.2-bin-hadoop3.2/* $SPARK_HOME && tar -zxvf /home/jdk-8u25-linux-x64.tar.gz -C /home/spark && mv -v /home/spark/jdk1.8.0_25/* $JAVA_HOME && unzip /home/instantclient-basic-linux.x64-19.8.0.0.0dbru.zip -d /home/spark && mv -v /home/spark/instantclient_19_8 /usr/share/oracle && echo "export JAVA_HOME=$JAVA_HOME" >> ~/.bashrc && \
echo "export PATH=$PATH:$JAVA_HOME/bin:$SPARK_HOME/bin:$SPARK_HOME/sbin:/usr/share/oracle/instantclient_19_8" >> ~/.bashrc && echo "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/share/oracle/instantclient_19_8" >> ~/.bashrc && echo "PYTHONPATH = $PYTHONPATH:/usr/bin/python3.8" >> ~/.bashrc && echo "alias python=/usr/bin/python3.8" >> ~/.bashrc
#WARNING: Running pip install with root privileges is generally not a good idea. Try `python3.8 -m pip install --user` instead.
# so I have to create a user
RUN echo "nameserver 9.9.9.9" >> /etc/resolv.conf
RUN useradd -d /home/spark/myuser myuser
USER myuser
WORKDIR /home/spark/myuser
ENV PATH="/home/spark/myuser/.local/bin:$PATH"
RUN \
python3.8 -m pip install --user pandas && \
python3.8 -m pip install --user cx-Oracle && \
python3.8 -m pip install --user persiantools && \
python3.8 -m pip install --user pyspark && \
python3.8 -m pip install --user py4j && \
python3.8 -m pip install --user python-dateutil && \
python3.8 -m pip install --user pytz && \
python3.8 -m pip install --user setuptools && \
python3.8 -m pip install --user six && \
python3.8 -m pip install --user numpy
# copy spark configs
ADD spark-env.sh $SPARK_HOME/conf/
ADD workers $SPARK_HOME/conf/
# expose various ports
EXPOSE 7012 7013 7014 7015 7016 8881 8081 7077
Also, I copy and build the dockerfile with this script:
#/bin/bash
cp /etc/ssh/ssh_config .
cp /opt/spark/conf/spark-env.sh .
cp /opt/spark/conf/workers .
sudo docker build -t my_docker .
echo "Script Finished."
The dockerfile built without any error. Then I make a tar file from the image that made with this command:
sudo docker save my_docker > my_docker.tar
After that I copy the my_docker.tar to the another computer and load it:
sudo docker load < my_docker.tar
sudo docker run -it my_docker
Unfortunately, when I run my program inside docker container, I receive errors about python package like numpy,pyspark,pandas.
File "/home/spark/ETL/test/main.py", line 3, in <module>
import cst_utils as cu
File "/home/spark/ETL/test/cst_utils.py", line 5, in <module>
import group_state as gs
File "/home/spark/ETL/test/group_state.py", line 1, in <module>
import numpy as np
ModuleNotFoundError: No module named 'numpy'
I also try to install the python packages in the docker container and then commit the container.But, when I exit from the container and enter again, there is no python package installed.
Would you please guide what is wrong with my way?
Any help is really appreciated.