I have a Cloud SQL instance storing data in a database, and I have checked the option for this Cloud SQL instance to block all unencrypted connections. When I select this option, I am given three SSL certificates - a server certificate, a client public key, and a client private key (three separate .pem files) (link to relevant CloudSQL+SSL documentation). These certificate files are used to establish encrypted connections to the Cloud SQL instance.
I'm able to successfully connect to Cloud SQL using MySQL from the command line using the --ssl-ca
, --ssl-cert
, and --ssl-key
options to specify the server certificate, client public key, and client private key, respectively:
mysql -uroot -p -h <host-ip-address> \
--ssl-ca=server-ca.pem \
--ssl-cert=client-cert.pem \
--ssl-key=client-key.pem
I am now trying to run a PySpark job that connects to this Cloud SQL instance to extract the data to analyze it. The PySpark job is basically the same as this example provided by Google Cloud training team. On line 39 of said script, there is a JDBC connection that is made to the Cloud SQL instance:
jdbcDriver = 'com.mysql.jdbc.Driver'
jdbcUrl = 'jdbc:mysql://%s:3306/%s?user=%s&password=%s' % (CLOUDSQL_INSTANCE_IP, CLOUDSQL_DB_NAME, CLOUDSQL_USER, CLOUDSQL_PWD)
but this does not make an encrypted connection and does not provide the three certificate files. If I have unencrypted connections to the Cloud SQL instance disabled, I see the following error message:
17/09/21 06:23:21 INFO org.spark_project.jetty.util.log: Logging initialized @5353ms
17/09/21 06:23:21 INFO org.spark_project.jetty.server.Server: jetty-9.3.z-SNAPSHOT
17/09/21 06:23:21 INFO org.spark_project.jetty.server.Server: Started @5426ms
17/09/21 06:23:21 INFO org.spark_project.jetty.server.AbstractConnector: Started ServerConnector@74af54ac{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
[...snip...]
py4j.protocol.Py4JJavaError: An error occurred while calling o51.load.
: java.sql.SQLException: Access denied for user 'root'@'<cloud-sql-instance-ip>' (using password: YES)
whereas if I have unencrypted connections to the Cloud SQL instance enabled, the job runs just fine. (This indicates that the issue is not with Cloud SQL API permissions - the cluster I'm running the PySpark job from definitely have permission to access the Cloud SQL instance.)
The JDBC connection strings I have found involving SSL add a &useSSL=true
or &encrypt=true
but do not point to external certificates; or, they use a keystore in some kind of Java-specific procedure. How can I modify the JDBC connection string from the Python script linked to above, in order to point JDBC (via PySpark) to the locations of the server certificate and client public/private keys (server-ca.pem, client-cert.pem, and client-key.pem) on disk?