How to load data from Azure Databricks SQL to GCP Databricks SQL

Question

Is there an easy way to load data from Azure Databricks Spark DB to GCP Databricks Spark DB?

score 0 · Answer 1 · answered Jan 08 '23 at 21:02

Obtain JDBC details from Azure instance and use them in GCP to pull data just as from any other JDBC source.

// This is run in GCP instance
some_table = spark.read
  .format("jdbc")
  .option("url", "jdbc:databricks://adb-xxxx.azuredatabricks.net:443/default;transportMode=http;ssl=1;httpPath=sql/xxxx;AuthMech=3;UID=token;PWD=xxxx")
  .option("dbtable", "some_table")
  .load()

Assuming Azure data is stored in Blob/ADLSv2 storage, mount it in GCP instance's DBFS and read data directly.

// This is run in GCP instance
// Assuming ADLSv2 on Azure side
val configs = Map(
  "fs.azure.account.auth.type" -> "OAuth",
  "fs.azure.account.oauth.provider.type" -> "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
  "fs.azure.account.oauth2.client.id" -> "<application-id>",
  "fs.azure.account.oauth2.client.secret" -> dbutils.secrets.get(scope="<scope-name>",key="<service-credential-key-name>"),
  "fs.azure.account.oauth2.client.endpoint" -> "https://login.microsoftonline.com/<directory-id>/oauth2/token")

dbutils.fs.mount(
  source = "abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/",
  mountPoint = "/mnt/<mount-name>",
  extraConfigs = configs)

some_data = spark.read
  .format("delta")
  .load("/mnt/<mount_name>/<some_schema>/<some_table>")

mounting isn't recommended, and as I remember it doesn't work on GCP — Alex Ott, Jan 10 '23 at 15:09
Well, we can also [directly access it](https://learn.microsoft.com/en-us/azure/databricks/external-data/azure-storage) without mounting. — Kombajn zbożowy, Jan 10 '23 at 15:36

How to load data from Azure Databricks SQL to GCP Databricks SQL

1 Answers1