0

I've noticed that we get frequent "communications link failure" messages with Google App Engine and Google Cloud SQL. It happens particularly when our App Engine app goes dormant and then needs to wake up. This happens more often on our test server (which can go unused more often than our production environment).

We recently switched to MySQL 2nd Gen (v5.7). I thought that may alleviate the problem because the 2nd Gen instances remain active (ie. activation policy = always on). But we still get "communications link failure" errors.

We also recently started using Hikari connection pools. Same error occurs and it is caught by HikariCP.

enter image description here

Mike Dee
  • 558
  • 1
  • 5
  • 13
  • Possible duplicate of [com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure](https://stackoverflow.com/questions/2983248/com-mysql-jdbc-exceptions-jdbc4-communicationsexception-communications-link-fai) – siamsot Aug 27 '19 at 15:58
  • I don’t think that’s it. On a second attempt a connection occurs with no changes to code. What is it about the app engine environment that causes this? – Mike Dee Aug 27 '19 at 20:35

2 Answers2

0

There could be multiple root causes for this issue, including the number of connections to the instances, networking, firewall configurations, or application source code. The first thing to verify would be the number of connections to the instances. Take a look at this great answer for guidance [1].

Something you could configure is the “wait_timeout” flag for your Cloud SQL instance [2][3]. Since you are using App Engine Standard and mention your app goes dormant after longer periods of inactivity, there is a startup process that takes some time when the first request is sent, and it is possible your Cloud SQL instance isn’t waiting long enough for that process to complete.

[1] https://stackoverflow.com/a/10772407/5921021
[2] https://cloud.google.com/sql/docs/mysql/flags#list-flags
[3] https://dev.mysql.com/doc/refman/5.7/en/server-system-variables.html#sysvar_wait_timeout

PYB
  • 503
  • 6
  • 20
  • Problem is not #1. After the GAE app spins up, the connecction string works. It just times out the first time after the GAE starts up. The wait-timeout is set to the default, which I think is 30 minutes. I think GAE shutdown (because of inactivity) is less than 30 minutes. That would kill the entire app, including connection pool. So I don't think it is wait-timeout. It could be that the HikariCP connection timeout is too short (30 seconds is default) and it takes GAE more than 30 seconds to start up. – Mike Dee Aug 29 '19 at 19:47
  • GAE usually takes much less time to start up than 30 seconds. You can look up exactly how much in your specific case by going to Stackdriver logging [1], and search requests with the filter “protoPayload.wasLoadingRequest=true”. This is a loading request and the accompanying “protoPayload.latency” will reveal how much time it took to start up. The logs history might also reveal the chain of events leading up to the communications link failure, so they are worth looking into. – PYB Sep 11 '19 at 13:52
  • I suggest to look up this example [2] that uses the same components as your use case: App Engine, Cloud SQL, Hikari. If you successfully deploy this example and don’t experience the “communications link failure”, compare its configuration and code with your own to target any potential culprit. Additionally, here are some important concepts when it comes to managing database connections for Cloud SQL [3]. – PYB Sep 11 '19 at 13:52
  • [1] https://cloud.google.com/logging/docs/view/overview [2] https://github.com/GoogleCloudPlatform/java-docs-samples/tree/master/cloud-sql/mysql/servlet [3] https://cloud.google.com/sql/docs/mysql/manage-connections – PYB Sep 11 '19 at 13:52
0

Two things I tried that seemed to work.

First, I made an error in thinking we were connecting to a 2nd Gen MySQL instance. We were actually still connecting to the 1st gen instance. Those instances go dormant after a while and that may have caused the connection pool timeouts.

I increased the connection timeout from 30 seconds to 60 seconds.

I'm not sure which of these eliminated the timeouts. But we rarely get timeouts now.

Mike Dee
  • 558
  • 1
  • 5
  • 13