1

I have a TensorFlow model training on an Ubuntu 16.04 virtual machine on Azure. Suddenly, the TensorBoard process is not reachable anymore from outside. The Network Security Group should be configured properly (see picture) and, as I said, it used to work up to this evening. I didn't change anything on the machine. Any check that I can make? Any hint? Thanks!

NSG

petrux
  • 1,743
  • 1
  • 17
  • 30
  • Im assuming you confirmed that the public IP you are using is still assigned to the VM? – CtrlDot Jun 29 '17 at 19:10
  • I am logging via SSH so the VM per se is reachable. Though, I never used the public IP but the `[name]-[resource].eastus.cloudapp.azure.com` – petrux Jun 29 '17 at 19:19
  • In your VM, `netstat -ant|grep 6006`. The port is listening? – Shui shengbao Jun 30 '17 at 02:58
  • Does this answer your question? [How can I run Tensorboard on a remote server?](https://stackoverflow.com/questions/37987839/how-can-i-run-tensorboard-on-a-remote-server) – desertnaut Jun 15 '20 at 15:46

2 Answers2

2

You should use netstat -ant|grep 6006 (TensorFlow is listening on 6006 by default). You should get the following result.

shui@shui:~$ netstat -ant|grep 6006
tcp        0      0 0.0.0.0:6006            0.0.0.0:*               LISTEN     

According to your description, I think the port is not in listening. When you start a tensorflow service, if you only use tensorboard --logdir=run1:/tmp/tensorflow/. When the ssh session is expired or closed, the service will be stop, you could not connect tensoforflow service. You could use the following command to start the service. Even you ssh session is expired or closed, you also could access your service.

nohup tensorboard --logdir=run1:/tmp/tensorflow/ &

man nohup

nohup - run a command immune to hangups, with output to a non-tty

& to the command line to run in the background:

In alternative to nohup, you could achieve a similar result running the tensorboard within a screen session:

:~$ screen -S tensorboard-screen
:~$ tensorboard --logdir=run1:/tmp/tensorflow/

then type Ctrl + a, d to detach the screen and go back to the main shell. When you exit the ssh session, the screen will be running on. Once logged back in, just type screen -r tensroboard-screen to resume the screen session.

petrux
  • 1,743
  • 1
  • 17
  • 30
Shui shengbao
  • 18,746
  • 3
  • 27
  • 45
  • I am running the tensorboard in a `screen` session and, moreover, I think it's the same, right? – petrux Jun 30 '17 at 06:20
  • When you running the tensorboard in a `screen` session, when you close the session or the session is expired, you could not access `tensorboard`. – Shui shengbao Jun 30 '17 at 06:24
  • I know, but the process was still running when I tried to access it. Moreover, ssh logging into the remote machine and trying a `wget` on port 6006 worked perfectly. – petrux Jun 30 '17 at 06:26
  • When you close the `session`, you could access the port 6006? I suggest you could run the service in the background. – Shui shengbao Jun 30 '17 at 06:28
  • I see. Actually, the service is run in background in a screen session. I am running it right now and can access the tensorboard without any problem, by the way... that' weird. – petrux Jun 30 '17 at 06:31
  • @petrux Could you access TensorBoard ? In your question, you could not access it. Now, the port is listening and you could access it. I think you could access TensorBoard. What you did? – Shui shengbao Jun 30 '17 at 06:33
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/148017/discussion-between-walter-msft-and-petrux). – Shui shengbao Jun 30 '17 at 06:33
  • Update: even without nohup, everything is working. Maybe it was something related to the tensorboard? I will inspect deeper later and will follow up. – petrux Jun 30 '17 at 07:40
  • No, I don't think tensorboard will stop the service automation, I will test and check log. – Shui shengbao Jun 30 '17 at 07:42
  • Right now I am working on other stuff. I will start training again my models on Monday. I tried just once (so far) but could not reproduce the problem. – petrux Jul 07 '17 at 15:04
  • I am running the same config right now but cannot reproduce the problem. – petrux Jul 10 '17 at 17:45
  • @petrux Thank for you apply, you could continue to observe it. But I think it is not a configure issue. – Shui shengbao Jul 12 '17 at 08:30
  • before I accept the answer, could you please edit it (apparently I can't) adding also `screen` as an alternative to `nohup`? Thanks! – petrux Jul 13 '17 at 13:26
  • Hi, you could edit my answer. I will.approve it tomorrow. Sorry, I am not at office. – Shui shengbao Jul 13 '17 at 13:40
1

If you just setup a VM and then try to run tensorboard it won't work as you need to add an inbound port rule. You can follow this blog post to setup the port correctly.

It can take a few minutes to apply the changes. Tensorboard kept saying it received a bad request said bad request after changing the rules and immediately trying to reach tensorboard.

Ben
  • 953
  • 8
  • 20