0

Is there an easy way to load data from Azure Databricks Spark DB to GCP Databricks Spark DB?

user3402692
  • 47
  • 1
  • 4

1 Answers1

0
  1. Obtain JDBC details from Azure instance and use them in GCP to pull data just as from any other JDBC source.
// This is run in GCP instance
some_table = spark.read
  .format("jdbc")
  .option("url", "jdbc:databricks://adb-xxxx.azuredatabricks.net:443/default;transportMode=http;ssl=1;httpPath=sql/xxxx;AuthMech=3;UID=token;PWD=xxxx")
  .option("dbtable", "some_table")
  .load()
  1. Assuming Azure data is stored in Blob/ADLSv2 storage, mount it in GCP instance's DBFS and read data directly.
// This is run in GCP instance
// Assuming ADLSv2 on Azure side
val configs = Map(
  "fs.azure.account.auth.type" -> "OAuth",
  "fs.azure.account.oauth.provider.type" -> "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
  "fs.azure.account.oauth2.client.id" -> "<application-id>",
  "fs.azure.account.oauth2.client.secret" -> dbutils.secrets.get(scope="<scope-name>",key="<service-credential-key-name>"),
  "fs.azure.account.oauth2.client.endpoint" -> "https://login.microsoftonline.com/<directory-id>/oauth2/token")

dbutils.fs.mount(
  source = "abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/",
  mountPoint = "/mnt/<mount-name>",
  extraConfigs = configs)

some_data = spark.read
  .format("delta")
  .load("/mnt/<mount_name>/<some_schema>/<some_table>")
Kombajn zbożowy
  • 8,755
  • 3
  • 28
  • 60