2

I have cloud foundry deployed on azure machines using bosh. For adding another node to the cluster I made changes in the cloud foundry deployment manifest file and redeployed cloud foundry using bosh deploy.

The deployment failed in-between but bosh deployment lock didn't get released. When I do bosh locks I see the lock is acquired by deployment and the expiring time is slightly more than current time and it keeps increasing endlessly.

bosh locks
Acting as user 'admin' on 'bosh'

+------------+-----------------------+-------------------------+
| Type       | Resource              | Expires at              |
+------------+-----------------------+-------------------------+
| deployment | single-vm-cf-on-azure | 2017-05-23 10:27:59 UTC |
+------------+-----------------------+-------------------------+

I tried cancelling the deployment task (bosh cancel task #task-number). The status of deployment task changed to cancelling but it didn't get cancelled.

bosh tasks
Acting as user 'admin' on 'bosh'

+----+------------+-------------------------+-------+-------------------+--------+
| #  | State      | Timestamp               | User  | Description       | Result |
+----+------------+-------------------------+-------+-------------------+--------+
| 38 | cancelling | 2017-05-23 08:40:12 UTC | admin | create deployment |        |
+----+------------+-------------------------+-------+-------------------+--------+

The issue I am facing is bosh has acquired deployment lock and everytime I try to start deployment again or try to delete deployment I get an error saying:

Error 100: Unable to get deployment lock, maybe a deployment is in progress. Try again later.

1. Can I delete the acquired deployment lock information to release the lock? If yes, where is the information stored and how to delete it?

2. If a task (example, bosh deploy) fails does it keep an infinite hold on the bosh locks? Is there a way of handling task failure gracefully?

3. How to do bosh deploy with changes in deployment manifest file to avoid getting into infinite deployment lock acquired situation?

Thanks in advance

Eddie
  • 9,696
  • 4
  • 45
  • 58
doit
  • 109
  • 5
  • You have a task that is running, so it's maintaining the lock. Unfortunately, canceling the task does not seem to be working. If you run `bosh task 38` to view the task output, what do you get? – Daniel Mikusa May 23 '17 at 18:59
  • @DanielMikusa When we cancel a task it tries to find a safe point. In my case I think it didn't find any safe point hence was taking forever to cancel. But I logged into bosh director and deleted the lock acquired by the task. That resolved the deployment lock issue. – doit May 25 '17 at 05:37
  • That's probably not safe and I don't see how that would address the issue of your stuck task. Were you able to get the task unstuck? The safest course of action is to figure out why the task is stuck and take action to unstick it. Another option would be to restart the director, but if you don't know where the task was stuck, you don't know the state of your system and that can be unsafe too. – Daniel Mikusa May 25 '17 at 14:40
  • By the way, you should use BOSH cli v2. Maybe this issue is fixed in the new CLI. – muehsi May 26 '17 at 07:15

1 Answers1

0

We can ssh to BOSH director VM and delete the lock manually.

We we deploy BOSH the key required to connect to BOSH director is stored in the home directory as 'bosh'.

Or the same information in present in bosh.yml file under 'ssh_tunnel' section. In my case it looked like this:

ssh_tunnel:
    host: 10.0.0.4
    port: 22
    user: vcap
    private_key: ~/bosh

Steps to connect:

  1. ssh -i ~/bosh vcap@10.0.0.4
  2. cd /var/vcap/packages/postgres/bin
  3. ./psql -U postgres -p 5524 bosh
  4. Delete lock entry from 'locks' table
doit
  • 109
  • 5