I have a MySQL database deployed to Google Compute Engine instance and I'm trying to move the data to Big Query for some analysis. I'm trying to get this working with Google Data Fusion but I'm encountering the following error:
java.lang.RuntimeException: java.lang.RuntimeException: com.mysql.cj.jdbc.exceptions.CommunicationsException: Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
at org.apache.hadoop.mapreduce.lib.db.DBInputFormat.setConf(DBInputFormat.java:171) ~[hadoop-mapreduce-client-core-2.9.2.jar:na]
at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:124) ~[spark-core_2.11-2.3.4.jar:2.3.4]
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) ~[spark-core_2.11-2.3.4.jar:2.3.4]
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) ~[spark-core_2.11-2.3.4.jar:2.3.4]
at scala.Option.getOrElse(Option.scala:121) ~[scala-library-2.11.8.jar:na]
at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) [spark-core_2.11-2.3.4.jar:2.3.4]
at io.cdap.cdap.app.runtime.spark.data.DatasetRDD.getPartitions(DatasetRDD.scala:61) ~[na:na]
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) ~[spark-core_2.11-2.3.4.jar:2.3.4]
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) ~[spark-core_2.11-2.3.4.jar:2.3.4]
at scala.Option.getOrElse(Option.scala:121) ~[scala-library-2.11.8.jar:na]
at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) [spark-core_2.11-2.3.4.jar:2.3.4]
at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:84) ~[spark-core_2.11-2.3.4.jar:2.3.4]
at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:84) ~[spark-core_2.11-2.3.4.jar:2.3.4]
The error itself is straightforward enough. Cloud Data Fusion isn't connecting successfully to my MySQL instance. The question is how to resolve this? Do I really have to set up some kind of VPC even though all the resources are on the same Google project? How can I see the networking environment of my Google Cloud Fusion cluster/environment so I can validate that the correct ports are open, etc.? I've entered my JDBC connection string into an external jdbc client and can access my database through a public IP so I know it works.