Google Cloud Data Fusion JDBC Connection error with Google Compute Engine deployed MySQL

Question

I have a MySQL database deployed to Google Compute Engine instance and I'm trying to move the data to Big Query for some analysis. I'm trying to get this working with Google Data Fusion but I'm encountering the following error:

java.lang.RuntimeException: java.lang.RuntimeException: com.mysql.cj.jdbc.exceptions.CommunicationsException: Communications link failure

The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
    at org.apache.hadoop.mapreduce.lib.db.DBInputFormat.setConf(DBInputFormat.java:171) ~[hadoop-mapreduce-client-core-2.9.2.jar:na]
    at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:124) ~[spark-core_2.11-2.3.4.jar:2.3.4]
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) ~[spark-core_2.11-2.3.4.jar:2.3.4]
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) ~[spark-core_2.11-2.3.4.jar:2.3.4]
    at scala.Option.getOrElse(Option.scala:121) ~[scala-library-2.11.8.jar:na]
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) [spark-core_2.11-2.3.4.jar:2.3.4]
    at io.cdap.cdap.app.runtime.spark.data.DatasetRDD.getPartitions(DatasetRDD.scala:61) ~[na:na]
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) ~[spark-core_2.11-2.3.4.jar:2.3.4]
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) ~[spark-core_2.11-2.3.4.jar:2.3.4]
    at scala.Option.getOrElse(Option.scala:121) ~[scala-library-2.11.8.jar:na]
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) [spark-core_2.11-2.3.4.jar:2.3.4]
    at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:84) ~[spark-core_2.11-2.3.4.jar:2.3.4]
    at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:84) ~[spark-core_2.11-2.3.4.jar:2.3.4]

The error itself is straightforward enough. Cloud Data Fusion isn't connecting successfully to my MySQL instance. The question is how to resolve this? Do I really have to set up some kind of VPC even though all the resources are on the same Google project? How can I see the networking environment of my Google Cloud Fusion cluster/environment so I can validate that the correct ports are open, etc.? I've entered my JDBC connection string into an external jdbc client and can access my database through a public IP so I know it works.

You can create a VPC network for Compute Engine with MySQL and create Data Fusion instance with private IP address, follow the guide: https://cloud.google.com/data-fusion/docs/how-to/create-private-ip You should add Data Fusion instance to the same VPC network. Additionally please check the another [thread](https://stackoverflow.com/questions/56835022/failed-to-connect-with-mysql-using-google-data-fusion). Let me know about the results. — aga, Jun 19 '20 at 11:04

Google Cloud Data Fusion JDBC Connection error with Google Compute Engine deployed MySQL

0 Answers0