3

I am running two kubernetes clusters on coreos in aws (acc and prod), and on both have setup a custom registry with nginx ssl (bought wildcart certs, checked ok) termination going to v1 + v2 backends, and all was running fine. Somehow, I now have an issue where one particular build won't upload. Another image uploads just fine...time and time again I see the same behaviour.

The two images I build are WEB (virtual size around 390 MB) and API (virtual size around 420 MB). The one causing anomalies is the WEB image which is only slightly larger, so I see no problem there.

Again, al was running fine, until this particular image showed up. I have created new builds of varying sizes, but it just won't upload. The other image uploads fine after that, going into the same repository, which is what makes this case so interesting (and making me go insane ;). I don't believe this to be an issue with aws ssl elb settings, as I do ssl termination in the nginx container, and all of the other services run fine in the same architecture.

In response to future questions as to why the v1 backend is necessary: It is needed to accommodate wercker, which (still) posts to the v1 backend. The registry then redirects the traffic to the v2 backend, where the images get stored.

The logs of the registries (v1 and v2 shown) show the following output (and in this order):

PUT /v1/repositories/web/ 01/Apr/2016:09:47:41 +0000 DEBUG: args = {'namespace': 'library', 'repository': u'xxxxx'}

POST /v2/xxxxx/blobs/uploads/

time="2016-04-01T10:07:31Z" level=info msg="response completed" go.version=go1.5.3 http.request.host=xxxxx http.request.id=f3f5b5c0-44ce-4d1b-9f41-7cf9b06e6c3d http.request.method=POST http.request.remoteaddr=172.22.90.1 http.request.uri="/v2/xxxxx/blobs/uploads/" http.request.useragent="docker/1.9.1 go/go1.4.3 git-commit/9894698 kernel/4.3.6-coreos os/linux arch/amd64" http.response.duration=196.065061ms http.response.status=202 http.response.written=0 instance.id=741a8348-2a62-4b49-8f78-99f102bf7593 version=v2.3.1

PATCH /v2/REPO/blobs/uploads/30bbaca1-3c4a-4766-a59e-8dd6fc1ebc25 [...]

time="2016-04-01T09:49:42Z" level=error msg="client disconnected during blob PATCH" go.version=go1.5.3 http.request.host=xxxxx http.request.id=05dd5386-e797-4122-be43-4d2c564b28be http.request.method=PATCH http.request.remoteaddr=172.22.90.1 http.request.uri="/v2/xxxxx/blobs/uploads/30bbaca1-3c4a-4766-a59e-8dd6fc1ebc25?_state=E_ajSTSwyO48bb-dO9hmnXaPXxTH9Bc2PdB2BMaFki97Ik5hbWUiOiJqdW5nby13ZWIiLCJVVUlEIjoiMzBiYmFjYTEtM2M0YS00NzY2LWE1OWUtOGRkNmZjMWViYzI1IiwiT2Zmc2V0IjowLCJTdGFydGVkQXQiOiIyMDE2LTA0LTAxVDA5OjQ3OjU5LjM4NDEzNjkyOVoifQ%3D%3D"

The docker client seems to not receive a termination signal (or something like that) from the registry, making it upload the first layer forever and ultimately timing out. Nothing get's tagged, and the upload is purged.

EDIT: I have succesfully pushed the image by hand, using the 1.10.1 docker-cli, so the issue must be with the wercker docker-cli ;(

Community
  • 1
  • 1
Morriz
  • 123
  • 7

0 Answers0