I'm trying to use a regular EBS persistent storage volume in OpenShift Online Next Gen, and getting the following error when attempting to deploy:
Unable to mount volumes for pod "production-5-vpxpw_instanttabletop(d784f054-a66b-11e7-a41e-0ab8769191d3)": timeout expired waiting for volumes to attach/mount for pod "instanttabletop"/"production-5-vpxpw". list of unattached/unmounted volumes=[volume-mondv]
Followed (after a while) by multiple instances of:
Failed to attach volume "pvc-702876a2-a663-11e7-8348-0a69cdf75e6f" on node "ip-172-31-61-152.us-west-2.compute.internal" with: Error attaching EBS volume "vol-0fb5515c87914b844" to instance "i-08d3313801027fbc3": VolumeInUse: vol-0fb5515c87914b844 is already attached to an instance status code: 400, request id: 54dd24cc-6ab0-434d-85c3-f0f063e73099
The log for the deploy pod looks like this after it all times out:
--> Scaling production-5 to 1
--> Waiting up to 10m0s for pods in rc production-5 to become ready
W1001 05:53:28.496345 1 reflector.go:323] github.com/openshift/origin/pkg/deploy/strategy/support/lifecycle.go:509: watch of *api.Pod ended with: too old resource version: 1455045195 (1455062250)
error: update acceptor rejected production-5: pods for rc "production-5" took longer than 600 seconds to become ready
I thought at first that this might be related to this issue, but the only running pods are the deploy and the one that's trying to start, and I've switched to a Recreate
strategy as suggested there, with no results.
Things did deploy and run normally the very first time, but since then I haven't been able to get it to deploy successfully.
Can anyone shed a little light on what I'm doing wrong here?
Update #1:
As an extra wrinkle, sometimes when I deploy it's taking what seems to be a long time to spin up deploy pods for this (I don't actually know how long it should take, but I get a warning suggesting things are going slowly, and my current deploy is sitting at 15+ minutes so far without having stood up).
In the deploy pod's event list, I'm seeing multiple instances each of Error syncing pod
and Pod sandbox changed, it will be killed and re-created.
as I wait, having touched nothing.
Doesn't happen every time, and I haven't discerned a pattern.
Not sure if this is even related, but seemed worth mentioning.
Update #2:
I tried deploying again this morning, and after canceling one deploy which was experiencing the issue described in my first update above, things stood up successfully.
I made no changes as far as I'm aware, so I'm baffled as to what the issue is or was here. I'll make a further update as to whether or not the issue recurs.
Update #3
After a bunch of further experimentation, I seem to be able to get my pod up and running regularly now. I didn't change anything about the configuration, so I assume this is something to do with sequencing, but even now it's not without some irregularities:
If I start a deploy, the existing running pod hangs indefinitely in the terminating
state according to the console, and will stay that way until it's hard deleted (without waiting for it to close gracefully). Until that happens, it'll continue to produce the error described above (as you'd expect).
Frankly, this doesn't make sense to me, compared to the issues I was having last night - I had no other pods running when I was getting these errors before - but at least it's progress in some form.
I'm having some other issues once my server is actually up and running (requests not making it to the server, and issues trying to upgrade to a websocket connection), but those are almost certainly separate, so I'll save them for another question unless someone tells me they're actually related.
Update #4
OpenShift's ongoing issue listing hasn't changed, but things seem to be loading correctly now, so marking this as solved and moving on to other things.
For posterity, changing from Rolling
to Recreate
is key here, and even then you may need to manually kill the old pod if it gets stuck trying to shut down gracefully.