5

I'm trying to connect to CloudSQL with a python pipeline.


Actual situation

  • I can do it without any problem using DirectRunner
  • I can not connect using DataflowRunner

Connection function

def cloudSQL(input):
    import pymysql
    connection = pymysql.connect(host='<server ip>',
                                   user='...',
                                   password='...',
                                   db='...')
    cursor = connection.cursor()
    cursor.execute("select ...")
    connection.close()
    result = cursor.fetchone()
    if not (result is None):
        yield input

The error

This is the error message using DataflowRunner

OperationalError: (2003, "Can't connect to MySQL server on '<server ip>' (timed out)")

CloudSQL

I have publicIP (to test from local with directrunner) and I have also trying to activating private IP to see if this could be the problem to connect with DataflowRunner


Option2

I have also tried with

connection = pymysql.connect((unix_socket='/cloudsql/' + <INSTANCE_CONNECTION_NAME>,
                               user='...',
                               password='...',
                               db='...')

With the error:

OperationalError: (2003, "Can't connect to MySQL server on 'localhost' ([Errno 2] No such file or directory)")
IoT user
  • 1,222
  • 4
  • 22
  • 49

4 Answers4

0

Take a look at the Cloud SQL Proxy. It will create a local entrypoint (Unix socket or TCP port depending on what you configure) that will proxy and authenticate connections to your Cloud SQL instance.

kurtisvg
  • 3,412
  • 1
  • 8
  • 24
0

You would have to mimic the implementation of JdbcIO.read() in Python as explained in this StackOverflow answer

Omair
  • 481
  • 3
  • 6
0

With this solution I was able to access to CloudSQL.

For testing purpose you can add 0.0.0.0/0 to CloudSQL publicIP without using certificates

IoT user
  • 1,222
  • 4
  • 22
  • 49
0

I created a example using Cloud SQL Proxy inside the Dataflow worker container, connection from the Python pipeline using Unix Sockets without need for SSL or IP authorization.

So the pipeline is able to connect to multiple Cloud SQL instances.

https://github.com/jccatrinck/dataflow-cloud-sql-python

There is a screenshot showing the log output showing the database tables as example.

Jessé Catrinck
  • 2,227
  • 19
  • 20
  • Good work. Thanks! Do you have another example where you integrate SQL proxy without docker/dockerfile? – Dave Feb 06 '22 at 12:56
  • If anyone is in trouble to do as Dave said, I really tried and ended up having to deal with SSL certificates, I didn't try it. So, this is the reason I created this repo, to help the next one who will need to connect to Could SQL. – Jessé Catrinck Aug 23 '22 at 18:12