I'm working on creating a tool that allows users to run a jupyter-notebook w/ pyspark on an AWS server and forward the port to their localhost to connect to the notebook.
I've been using subprocess.Popen to ssh into the remote server and kick off the pyspark shell/notebook, but I'm unable to avoid having it print everything to the terminal. I WANT to perform an action per line to retrieve the port number.
For example, running this (following the most popular answer here: Read streaming input from subprocess.communicate())
command = "jupyter-notebook"
con = subprocess.Popen(['ssh', node, command], stdout=subprocess.PIPE, bufsize=1)
with con.stdout:
for line in iter(con.stdout.readline, b''):
print(line),
con.wait()
this ignores the context manager, and the con
portion starts printing off stdout so that this is immediately printed to terminal
[I 16:13:20.783 NotebookApp] [nb_conda_kernels] enabled, 0 kernels found
[I 16:13:21.031 NotebookApp] JupyterLab extension loaded from /home/*****/miniconda3/envs/aws/lib/python3.7/site-packages/jupyterlab
[I 16:13:21.031 NotebookApp] JupyterLab application directory is /data/data0/home/*****/miniconda3/envs/aws/share/jupyter/lab
[I 16:13:21.035 NotebookApp] [nb_conda] enabled
...
...
...
I can get the context manager to function when I call a random script like the below instead of "jupyter-notebook" (where command="bash random_script.sh"
)
# random_script.sh
for i in $(seq 1 100)
do
echo "some output: $i"
sleep 2
done
This acts as expected, and I can actually perform an action per line within the with
statement. Is there something fundamentally different about the jupyter version that prevents this from acting similarly?