4

I've been trying to add a new App Engine Module that uses Managed VMs instead of the default GAE sandbox. The intent is to provide a module where I can run newer versions of SciPy and NumPy, which can be called by my user-facing Module. I have successfully built and run my Docker images/containers locally, but have run into a lot of issues when I try to deploy to a custom version on the Google servers.

The following is from the serial console output for the Managed VM module instance, which continues to reboot due to issues that appear to be beyond my control.

Has anyone else run into these? Did I miss something during the configuration/deployment process?

FWIW: I've used GAE for several years and even contributed to it during my time at Google. I also have experience using Modules, as well as with Docker. The documentation and tooling around Managed VMs seems to be quite immature at the moment, and I've run out of steam trying to fight it. I need help.

Dec 09 00:52:41 vm_runtime_init: start 'pull_app'.
[   24.288054] docker0: port 1(veth8d67b7c) entered forwarding state
Dec  9 00:52:56 gae-mvm-vmv7-tsia kernel: [   24.288054] docker0: port 1(veth8d67b7c) entered forwarding state
Dec 09 00:52:57 Pulling GAE_FULL_APP_CONTAINER: appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7
gcm-Heartbeat:1449622390000
Dec 09 00:53:13 ERROR: Timed out while trying to pull appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7 from registry!
===== Unexpected error during VM startup =====
=== Dump of VM runtime system logs follows ===
WARNING: HTTP 404 error while fetching metadata key gae_cloud_sql_instances. Will treat it as an empty string.
WARNING: HTTP 404 error while fetching metadata key gae_cloud_sql_proxy_image_name. Will treat it as an empty string.
WARNING: HTTP 404 error while fetching metadata key gae_extra_nginx_confs. Will treat it as an empty string.
WARNING: HTTP 404 error while fetching metadata key gae_redirect_appengine_googleapis_com. Will treat it as an empty string.
WARNING: HTTP 404 error while fetching metadata key gae_http_loadbalancer_enabled. Will treat it as an empty string.
WARNING: HTTP 404 error while fetching metadata key gae_loadbalancer. Will treat it as an empty string.
WARNING: HTTP 404 error while fetching metadata key gae_loadbalancer_ip. Will treat it as an empty string.
WARNING: HTTP 404 error while fetching metadata key gae_memcache_proxy. Will treat it as an empty string.
WARNING: HTTP 404 error while fetching metadata key gae_monitoring_image_name. Will treat it as an empty string.
WARNING: HTTP 404 error while fetching metadata key gae_use_cloud_monitoring. Will treat it as an empty string.
vm_runtime_init: Dec 09 00:52:40 Invoking all VM runtime components. /dev/fd/63
vm_runtime_init: Dec 09 00:52:40 vm_runtime_init: start 'allow_ssh'.
vm_runtime_init: Dec 09 00:52:40 vm_runtime_init: Done start 'allow_ssh'.
vm_runtime_init: Dec 09 00:52:40 vm_runtime_init: start 'unlocker'.
vm_runtime_init: Dec 09 00:52:40 vm_runtime_init: Done start 'unlocker'.
vm_runtime_init: Dec 09 00:52:40 vm_runtime_init: start 'fluentd_logger'.
vm_runtime_init: Dec 09 00:52:41 vm_runtime_init: Done start 'fluentd_logger'.
vm_runtime_init: Dec 09 00:52:41 vm_runtime_init: start 'pull_app'.
vm_runtime_init: Dec 09 00:52:57 Pulling GAE_FULL_APP_CONTAINER: appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7
Error pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, Untar re-exec error: exit status 1: output: unexpected EOF
Error pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, Untar re-exec error: exit status 1: output: unexpected EOF
Error pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, Untar re-exec error: exit status 1: output: unexpected EOF
Error pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, Untar re-exec error: exit status 1: output: unexpected EOF
Error pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, Untar re-exec error: exit status 1: output: unexpected EOF
Dec 09 00:53:13 ERROR: Timed out while trying to pull appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7 from registry!
Dec 09 00:52:40 Invoking all VM runtime components. /dev/fd/63
Dec 09 00:52:40 vm_runtime_init: start 'allow_ssh'.
Dec 09 00:52:40 vm_runtime_init: Done start 'allow_ssh'.
Dec 09 00:52:40 vm_runtime_init: start 'unlocker'.
Dec 09 00:52:40 vm_runtime_init: Done start 'unlocker'.
Dec 09 00:52:40 vm_runtime_init: start 'fluentd_logger'.
8aa8a33b8daa451d5595b951aeecad772a23d65b6592ac07cae6265cc74b6312
Dec 09 00:52:41 vm_runtime_init: Done start 'fluentd_logger'.
Dec 09 00:52:41 vm_runtime_init: start 'pull_app'.
Dec 09 00:52:57 Pulling GAE_FULL_APP_CONTAINER: appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7
Using default tag: latest
Pulling repository appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7
b626d012b369: Pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7
b626d012b369: Pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, endpoint: https://appengine.gcr.io/v1/
b626d012b369: Pulling dependent layers
643a001c5ee0: Download complete
559718b5f880: Download complete
8f8068a6a6b4: Download complete
16d49c9e1091: Pulling metadata
16d49c9e1091: Pulling fs layer
16d49c9e1091: Download complete
54f405e77b26: Pulling metadata
54f405e77b26: Pulling fs layer
54f405e77b26: Download complete
36e2f6c710be: Pulling metadata
36e2f6c710be: Pulling fs layer
36e2f6c710be: Download complete
e8aed8091139: Pulling metadata
e8aed8091139: Pulling fs layer
e8aed8091139: Error downloading dependent layers
b626d012b369: Error pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, endpoint: https://appengine.gcr.io/v1/, Untar re-exec error: exit status 1: output: unexpected EOF
b626d012b369: Error pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, Untar re-exec error: exit status 1: output: unexpected EOF
Retrying docker pull.
Using default tag: latest
Pulling repository appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7
b626d012b369: Pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7
b626d012b369: Pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, endpoint: https://appengine.gcr.io/v1/
b626d012b369: Pulling dependent layers
643a001c5ee0: Download complete
559718b5f880: Download complete
8f8068a6a6b4: Download complete
16d49c9e1091: Download complete
54f405e77b26: Download complete
36e2f6c710be: Download complete
e8aed8091139: Pulling metadata
e8aed8091139: Pulling fs layer
e8aed8091139: Error downloading dependent layers
b626d012b369: Error pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, endpoint: https://appengine.gcr.io/v1/, Untar re-exec error: exit status 1: output: unexpected EOF
b626d012b369: Error pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, Untar re-exec error: exit status 1: output: unexpected EOF
Retrying docker pull.
Using default tag: latest
Pulling repository appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7
b626d012b369: Pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7
b626d012b369: Pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, endpoint: https://appengine.gcr.io/v1/
b626d012b369: Pulling dependent layers
643a001c5ee0: Download complete
559718b5f880: Download complete
8f8068a6a6b4: Download complete
16d49c9e1091: Download complete
54f405e77b26: Download complete
36e2f6c710be: Download complete
e8aed8091139: Pulling metadata
e8aed8091139: Pulling fs layer
e8aed8091139: Error downloading dependent layers
b626d012b369: Error pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, endpoint: https://appengine.gcr.io/v1/, Untar re-exec error: exit status 1: output: unexpected EOF
b626d012b369: Error pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, Untar re-exec error: exit status 1: output: unexpected EOF
Retrying docker pull.
Using default tag: latest
Pulling repository appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7
b626d012b369: Pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7
b626d012b369: Pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, endpoint: https://appengine.gcr.io/v1/
b626d012b369: Pulling dependent layers
643a001c5ee0: Download complete
559718b5f880: Download complete
8f8068a6a6b4: Download complete
16d49c9e1091: Download complete
54f405e77b26: Download complete
36e2f6c710be: Download complete
e8aed8091139: Pulling metadata
e8aed8091139: Pulling fs layer
e8aed8091139: Error downloading dependent layers
b626d012b369: Error pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, endpoint: https://appengine.gcr.io/v1/, Untar re-exec error: exit status 1: output: unexpected EOF
b626d012b369: Error pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, Untar re-exec error: exit status 1: output: unexpected EOF
Retrying docker pull.
Using default tag: latest
Pulling repository appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7
b626d012b369: Pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7
b626d012b369: Pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, endpoint: https://appengine.gcr.io/v1/
b626d012b369: Pulling dependent layers
643a001c5ee0: Download complete
559718b5f880: Download complete
8f8068a6a6b4: Download complete
16d49c9e1091: Download complete
54f405e77b26: Download complete
36e2f6c710be: Download complete
e8aed8091139: Pulling metadata
e8aed8091139: Pulling fs layer
e8aed8091139: Error downloading dependent layers
b626d012b369: Error pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, endpoint: https://appengine.gcr.io/v1/, Untar re-exec error: exit status 1: output: unexpected EOF
b626d012b369: Error pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, Untar re-exec error: exit status 1: output: unexpected EOF
Retrying docker pull.
CONTAINER ID        IMAGE                                    COMMAND                  CREATED             STATUS              PORTS               NAMES
8aa8a33b8daa        gcr.io/google_appengine/fluentd-logger   "/opt/google-fluentd/"   33 seconds ago      Up 32 seconds                           insane_panini
Container: 8aa8a33b8daa
========= rebooting. ========================

INIT: 
INIT: Sending processes the TERM signal


INIT: Sending processes the KILL signal

Dec  9 00:53:14 gae-mvm-vmv7-tsia init: Switching to runlevel: 1
gcm-StatusUpdate:TIME=1449622394000;STATUS=COMMAND_FAILED;INVOCATION_ID=0
[[36minfo[39;49m] Using makefile-style concurrent boot in runlevel 1.
Dec  9 00:53:15 gae-mvm-vmv7-tsia rpc.statd[1758]: Caught signal 15, un-registering and exiting
Dec  9 00:53:15 gae-mvm-vmv7-tsia google: shutdown script found in metadata.
[....] Stopping NFS common utilities: idmapd statd[?25l[?1c7[1G[[32m ok [39;49m8[?25h[?0c.
Dec  9 00:53:15 gae-mvm-vmv7-tsia shutdownscript: Running shutdown script /var/run/google.shutdown.script
Dec  9 00:53:15 gae-mvm-vmv7-tsia rpcbind: rpcbind terminating on signal. Restart with "rpcbind -w"
[....] Stopping rpcbind daemon...[?25l[?1c7[1G[[32m ok [39;49m8[?25h[?0c.
Stopping supervisor: supervisord.
udhcpd: Disabled. Edit /etc/default/udhcpd to enable it.
[....] Unmounting iscsi-backed filesystems: Unmounting all devices marked _netdev[?25l[?1c7[1G[[32m ok [39;49m8[?25h[?0c.
Dec  9 00:53:16 gae-mvm-vmv7-tsia iscsid: iscsid shutting down.
[....] Unmounting iscsi-backed filesystems: Unmounting all devices marked _netdev[?25l[?1c7[1G[[32m ok [39;49m8[?25h[?0c.
[....] Disconnecting iSCSI targets:iscsiadm: No matching sessions found
[?25l[?1c7[1G[[32m ok [39;49m8[?25h[?0c.
[....] Stopping iSCSI initiator service:[?25l[?1c7[1G[[32m ok [39;49m8[?25h[?0c.
Dec  9 00:53:16 gae-mvm-vmv7-tsia shutdownscript: Finished running shutdown script /var/run/google.shutdown.script
[....] Stopping Docker: docker[?25l[?1c7[1G[[32m ok [39;49m8[?25h[?0c.
[....] Stopping The Kubernetes container manager: kubelet[?25l[?1c7[1G[[32m ok [39;49m8[?25h[?0c.
[....] Stopping enhanced syslogd: rsyslogd[?25l[?1c7[1G[[32m ok [39;49m8[?25h[?0c.
[   54.500923] docker0: port 1(veth8d67b7c) entered disabled state
[   54.512521] docker0: port 1(veth8d67b7c) entered disabled state
[   54.522249] device veth8d67b7c left promiscuous mode
[   54.527554] docker0: port 1(veth8d67b7c) entered disabled state
Terminating on signal number 15
Traceback (most recent call last):
  File "/usr/share/google/google_daemon/manage_accounts.py", line 94, in <module>
    options.daemon, options.force, options.debug)
  File "/usr/share/google/google_daemon/manage_accounts.py", line 65, in Main
    manager_daemon.StartDaemon()
  File "/usr/share/google/google_daemon/accounts_manager_daemon.py", line 73, in StartDaemon
    self.accounts_manager.Main()
  File "/usr/share/google/google_daemon/accounts_manager.py", line 87, in Main
    writer.close()
IOError: [Errno 32] Broken pipe
[....] Asking all remaining processes to terminate...acpid: exiting

[?25l[?1c7[1G[[32m ok [39;49m8[?25h[?0cdone.
[....] All processes ended within 1 seconds....[?25l[?1c7[1G[[32m ok [39;49m8[?25h[?0cdone.
[[36minfo[3
INIT: Sending processes the TERM signal


INIT: Sending processes the KILL signal

sulogin: root account is locked, starting shell
root@gae-mvm-vmv7-tsia:~# 

Edit: Additional information from the shutdown.log is below. The docker logs command is not being run in any of my code or Dockerfiles -- I assume there is a bug in the way Google is using it on their end.

2015-12-08 17:08:22.194 Sending SIGUSR1 to fluentd to trigger a log flush.
2015-12-08 17:08:22.194 605e9f1ad747e63560fdc28a8c7f3c276d77255edd0a65381f7ad2f9f8eafd2a
2015-12-08 17:08:22.194 ---------------------------------------------------------------------
2015-12-08 17:08:22.194 ---------------App was unhealthy, grabbing debug logs----------------
2015-12-08 17:08:22.194 --------------------------App stdout/stderr--------------------------
2015-12-08 17:08:22.194 /usr/share/vm_runtime/vm_shutdown.sh: line 22: /var/run/app.cid: No such file or directory
2015-12-08 17:08:22.194 docker: "logs" requires 1 argument.
2015-12-08 17:08:22.194 See 'docker logs --help'.
2015-12-08 17:08:22.194 
2015-12-08 17:08:22.194 Usage:  docker logs [OPTIONS] CONTAINER
2015-12-08 17:08:22.194 
2015-12-08 17:08:22.194 Fetch the logs of a container
2015-12-08 17:08:22.194 ---------------------------------------------------------------------
2015-12-08 17:08:22.194 --------------------------Tail of app logs---------------------------
2015-12-08 17:08:22.194 tail: cannot open `/var/log/app_engine/app/app.0.log.json' for reading: No such file or directory
2015-12-08 17:08:22.194 ---------------------------------------------------------------------
erichiggins
  • 586
  • 1
  • 4
  • 9
  • Can you reproduce the problem outside of managed VM's? Like in container engine? – Bryan A Dec 09 '15 at 01:18
  • @PlanB I haven't tried to reproduce it in GCE, but as I mentioned, I was able to build a Docker container that runs locally on my machine and had no issues. I believe Google's deployment process injects additional layers on the image that I don't have access to, and the problem may be in there. Are Managed VM Docker containers able to run in GCE with little modification? – erichiggins Dec 09 '15 at 01:21

2 Answers2

2

EDIT

This answer is mostly correct, but inferring that the image doesn't exist based on curl was flawed. python-compat should work for you, but python is also a valid image, as can be seen from running docker pull gcr.io/google_appengine/python:

$ docker pull gcr.io/google_appengine/python
Pulling repository gcr.io/google_appengine/python
ac7db0912786: Download complete 
643a001c5ee0: Download complete 
559718b5f880: Download complete 
8f8068a6a6b4: Download complete 
16d49c9e1091: Download complete 
54f405e77b26: Download complete 
36e2f6c710be: Download complete 
e8aed8091139: Download complete 
8f0415d8e4e9: Download complete 
15ed20635873: Download complete 
6d70c8850a43: Download complete 
93ae290c32a1: Download complete 
7f766358fa71: Download complete 
7f4a74c30dc4: Download complete 
b51802c69e61: Download complete 
Status: Downloaded newer image for gcr.io/google_appengine/python:latest

In discussion with contributor jonparrott over at the github repo for the gcr.io/google_appengine/python docker image, the relation of the various python Managed VM / Custom Runtime docker images was clarified in a comment.

So, I think the issue you're seeing here, given that the VM fails to become healthy, could be related to the image you sourced from, your Dockerfile, your app code, or infrastructure. The first 3 are more likely than the last one, but it's not inconceivable. It seems the "Untar re-exec error: exit status 1: output: unexpected EOF" errors are the first outward manifestation of the problem during the serial console output.

This might be worth putting into an issue report for the Google Cloud Platform Public Issue Tracker along with information like your Dockerfile, your app's code if necessary, time-frames if it only occurred transiently, etc.


Original Answer

If you check the container registry, the image you're attempting to source FROM doesn't exist:

curl -v -X HEAD https://gcr.io/google_appengine/python
* Hostname was NOT found in DNS cache
*   Trying 74.125.193.82...
* Connected to gcr.io (74.125.193.82) port 443 (#0)
* successfully set certificate verify locations:
*   CAfile: none
  CApath: /etc/ssl/certs
* SSLv3, TLS handshake, Client hello (1):
* SSLv3, TLS handshake, Server hello (2):
* SSLv3, TLS handshake, CERT (11):
* SSLv3, TLS handshake, Server key exchange (12):
* SSLv3, TLS handshake, Server finished (14):
* SSLv3, TLS handshake, Client key exchange (16):
* SSLv3, TLS change cipher, Client hello (1):
* SSLv3, TLS handshake, Finished (20):
* SSLv3, TLS change cipher, Client hello (1):
* SSLv3, TLS handshake, Finished (20):
* SSL connection using ECDHE-RSA-AES128-GCM-SHA256
* Server certificate:
*        subject: C=US; ST=California; L=Mountain View; O=Google Inc; CN=*.googlecode.com
*        start date: 2015-12-02 14:40:36 GMT
*        expire date: 2016-03-01 00:00:00 GMT
*        subjectAltName: gcr.io matched
*        issuer: C=US; O=Google Inc; CN=Google Internet Authority G2
*        SSL certificate verify ok.
> HEAD /google_appengine/python HTTP/1.1
> User-Agent: curl/7.35.0
> Host: gcr.io
> Accept: */*
> 
< HTTP/1.1 404 Not Found
< Date: Thu, 10 Dec 2015 19:14:52 GMT
< Content-Type: text/html; charset=UTF-8
* Server Docker Registry is not blacklisted
< Server: Docker Registry
< Content-Length: 1584
< X-XSS-Protection: 1; mode=block
< X-Frame-Options: SAMEORIGIN
< Alternate-Protocol: 443:quic,p=0
< Alt-Svc: clear
< 

The actually-existing container registry image for the python runtime is:

gcr.io/google_appengine/python-compat

This should probably be the key to solving your deployment failures.

Nick
  • 3,581
  • 1
  • 14
  • 36
  • 1
    The documentation for Google's [GoogleCloudPlatform/python-docker](https://github.com/GoogleCloudPlatform/python-docker) repo indicates that `gcr.io/google_appengine/python` is a valid base image to use for Managed VMs. This seems to conflict with the [documentation for custom images](https://cloud.google.com/appengine/docs/managed-vms/custom-runtimes#base_images), which also suggests using `python-compat`. I'll give this a try and report back, as well as open an issue on that GH repo. – erichiggins Dec 10 '15 at 21:42
  • 1
    Yup, that was it. I'm filing tickets to help others who might run into this issue. Thanks! – erichiggins Dec 11 '15 at 01:17
  • Yep, sending feedback when you see something like that is correct. Hope the issue is received well and the repo changes! – Nick Dec 11 '15 at 15:02
1

I don't know the answer, but I have some ideas.

Missing Image?

Dec 09 00:53:13 ERROR: Timed out while trying to pull appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7 from registry!

Indicates some problem getting the docker image from the registry. It doesn't sound like a 404 not found error. Nevertheless, I'd try the following:

Make sure you can pull the image from your development machine using gcloud docker pull. If not, push it to the registry.

Using App Engine Base Image?

WARNING: HTTP 404 error while fetching metadata key gae_cloud_sql_instances. Will treat it as an empty string.
WARNING: HTTP 404 error while fetching metadata key gae_cloud_sql_proxy_image_name. Will treat it as an empty string.
WARNING: HTTP 404 error while fetching metadata key gae_extra_nginx_confs. Will treat it as an empty string.

This sounds like the local metadata server isn't running or properly configured. My guess is this means your custom docker image isn't using one of the standard base images, in particular the Python base image. Try updating your Dockerfile to use the standard python base image.

Doug Richardson
  • 10,483
  • 6
  • 51
  • 77
  • Thanks for the tips. My Dockerfile does use `FROM gcr.io/google_appengine/python`, which is one of the standard base images for GAE/Python/ManagedVMs. The Managed VM docs don't state a requirement to use Google's Container Registry explicitly, but it's entirely possible that the documentation is out of date or incorrect given how little it has helped up to this point. – erichiggins Dec 10 '15 at 04:15
  • `FROM gcr.io/google_appengine/python-compat` should be what it reads. I've never seen `FROM gcr.io/google_appengine/python` anywhere, is it possible that could be useful information? – Nick Dec 11 '15 at 14:21