10

After running mlflow ui on a remote server, I'm unable to reopen the mlflow ui again.
A workaround is to kill all my processes in the server using pkill -u MyUserName.
Otherwise I get the following error:

[INFO] Starting gunicorn 20.0.4  
[ERROR] Connection in use: ('127.0.0.1', 5000)
[ERROR] Retrying in 1 second.  
...
Running the mlflow server failed. Please see ther logs above for details.

I understand the error but I don't understand:
1. What is the correct way to shutdown mlflow ui
2. How can I identify the mlflow ui process in order to only kill that process and not use the pkill

Currently I close the browser or use ctrl+C

Suhas_Pote
  • 3,620
  • 1
  • 23
  • 38
skibee
  • 1,279
  • 1
  • 17
  • 37

5 Answers5

8

I also met a similar problem recently when I call mlflow ui in the remote server. The Ctrl + C in the command line to exit usually works. However, When it doesn't, using pkill -f gunicorn solves my problem. Note, you can also use ps -A | grep gunicorn to first find the process and kill [PID] manually. A similar problem seems to have been discussed here once.

Moore
  • 453
  • 1
  • 6
  • 10
5

If u cant connect to mlflow its bc its already running, u can run the following to kill the UI to spawn another one:

lsof -i :5000

Also, with MLFlow u can use -port to assign a port number u want to prevent confusion if you need multiple UI's launched; e.g. one for tracking, one for serving etc. By default the server runs on port 5000. If that port is already in use, use the –port option to specify a different port:

mlflow models serve -m runs:/<RUN_ID>/model --port 1234

UPDATE June 2022: You can add the --port flag to this cmd here to properly set up MLFlow: How do you start using MLflow SQL storage instead of the file system storage?

joe hoeller
  • 1,248
  • 12
  • 22
3

I was getting error on mlflow ui command.

Error was

[2022-04-19 10:48:02 -0400] [89933] [INFO] Starting gunicorn 20.1.0
[2022-04-19 10:48:02 -0400] [89933] [ERROR] Connection in use: ('127.0.0.1', 5000)
[2022-04-19 10:48:02 -0400] [89933] [ERROR] Retrying in 1 second.
[2022-04-19 10:48:03 -0400] [89933] [ERROR] Connection in use: ('127.0.0.1', 5000)
[2022-04-19 10:48:03 -0400] [89933] [ERROR] Retrying in 1 second.
[2022-04-19 10:48:04 -0400] [89933] [ERROR] Connection in use: ('127.0.0.1', 5000)
[2022-04-19 10:48:04 -0400] [89933] [ERROR] Retrying in 1 second.
[2022-04-19 10:48:05 -0400] [89933] [ERROR] Connection in use: ('127.0.0.1', 5000)
[2022-04-19 10:48:05 -0400] [89933] [ERROR] Retrying in 1 second.
[2022-04-19 10:48:06 -0400] [89933] [ERROR] Connection in use: ('127.0.0.1', 5000)
[2022-04-19 10:48:06 -0400] [89933] [ERROR] Retrying in 1 second.
[2022-04-19 10:48:07 -0400] [89933] [ERROR] Can't connect to ('127.0.0.1', 5000)

Solution that worked for me:

Step 1: Get the process id

ps -A | grep gunicorn

20734 ?? 0:39.17 /usr/local/Cellar/python@3.9/3.9.10/Frameworks/Python.framework/Versions/3.9/Resources/Python.app/Contents/MacOS/Python /Users/XXX/env/bin/gunicorn -b 127.0.0.1:5000 -w 1 mlflow.server:app

Step 2: Take the PID from last output and kill the process with that PID that is using the port

kill 20734

2

Quick solution:

Simply kill the process

fuser -k 5000/tcp

Command syntax

fuser -k <port>/tcp

Bonus: fuser 5000/tcp will print you PID of process bound on that port.

Note: Works on Linux only. More universal is use of lsof -i4 (or 6 for IPv6).

Suhas_Pote
  • 3,620
  • 1
  • 23
  • 38
0

By default, the mlflow UI binds to port 5000, so the subsequent invocation will result in a port busy error.

You can launch multiple MLflow ui and provide a different port numbers:

Usage: mlflow ui [OPTIONS]

  Launch the MLflow tracking UI for local viewing of run results. To launch
  a production server, use the "mlflow server" command instead.

  The UI will be visible at http://localhost:5000 by default, and only
  accept connections from the local machine. To let the UI server accept
  connections from other machines, you will need to pass ``--host 0.0.0.0``
  to listen on all network interfaces (or a specific interface address).

Options:
  --backend-store-uri PATH     URI to which to persist experiment and run
                               data. Acceptable URIs are SQLAlchemy-compatible
                               database connection strings (e.g.
                               'sqlite:///path/to/file.db') or local
                               filesystem URIs (e.g.
                               'file:///absolute/path/to/directory'). By
                               default, data will be logged to the ./mlruns
                               directory.
  --default-artifact-root URI  Path to local directory to store artifacts, for
                               new experiments. Note that this flag does not
                               impact already-created experiments. Default:
                               ./mlruns
  -p, --port INTEGER           The port to listen on (default: 5000).
  -h, --host HOST              The network address to listen on (default:
                               127.0.0.1). Use 0.0.0.0 to bind to all
                               addresses if you want to access the tracking
                               server from other machines.
  --help                       Show this message and exit.```

Try it and see what happens.

Jules Damji
  • 179
  • 2
  • 1
    Thank you for your reply and sorry for my late response. This works - and is probably a better workaround then the `pkill` option. However the old process is still running ... – skibee Mar 09 '20 at 06:55