0

I am trying to create a Python job on DataFlow that need a Cloud SQL connection (and I'm a total beginner). I need to execute several MySQL queries in ParDo (Apache Beam). I am using PyMySQL and have problem authenticating, so I tried this answer and apparently it works:

class MyDoFn(beam.DoFn):
 def setup(self):
    os.system("wget https://dl.google.com/cloudsql/cloud_sql_proxy.linux.amd64 -O cloud_sql_proxy")
    os.system("chmod +x cloud_sql_proxy")
    os.system(f"./cloud_sql_proxy -instances={self.sql_args['cloud_sql_connection_name']}=tcp:3306 &")

The thing is, I find this to be more of a work-around. Is it safe to authenticate this way? I would appreciate any help! Thank you in advance.

mipu
  • 74
  • 6

1 Answers1

1

Yes, this is a safe way to use a Cloud SQL connection. The cloud_sql_proxy uses authentication info from the Compute Engine instance to properly authenticate the connection. See https://cloud.google.com/sql/docs/mysql/sql-proxy#authentication-options for more about this.

danielm
  • 3,000
  • 10
  • 15
  • Is it necessary to close the proxy connection right afterwards, or just leave it until the job is finished and the os system will close by itself? And thanks a lot!! – mipu Jul 15 '20 at 20:27
  • 1
    VM shutdown will terminate the proxy process, so just leave it until job termination – danielm Jul 16 '20 at 17:37