Is there an easy way to load data from Azure Databricks Spark DB to GCP Databricks Spark DB?
Asked
Active
Viewed 56 times
1 Answers
0
- Obtain JDBC details from Azure instance and use them in GCP to pull data just as from any other JDBC source.
// This is run in GCP instance
some_table = spark.read
.format("jdbc")
.option("url", "jdbc:databricks://adb-xxxx.azuredatabricks.net:443/default;transportMode=http;ssl=1;httpPath=sql/xxxx;AuthMech=3;UID=token;PWD=xxxx")
.option("dbtable", "some_table")
.load()
- Assuming Azure data is stored in Blob/ADLSv2 storage, mount it in GCP instance's DBFS and read data directly.
// This is run in GCP instance
// Assuming ADLSv2 on Azure side
val configs = Map(
"fs.azure.account.auth.type" -> "OAuth",
"fs.azure.account.oauth.provider.type" -> "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id" -> "<application-id>",
"fs.azure.account.oauth2.client.secret" -> dbutils.secrets.get(scope="<scope-name>",key="<service-credential-key-name>"),
"fs.azure.account.oauth2.client.endpoint" -> "https://login.microsoftonline.com/<directory-id>/oauth2/token")
dbutils.fs.mount(
source = "abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/",
mountPoint = "/mnt/<mount-name>",
extraConfigs = configs)
some_data = spark.read
.format("delta")
.load("/mnt/<mount_name>/<some_schema>/<some_table>")

Kombajn zbożowy
- 8,755
- 3
- 28
- 60
-
mounting isn't recommended, and as I remember it doesn't work on GCP – Alex Ott Jan 10 '23 at 15:09
-
Well, we can also [directly access it](https://learn.microsoft.com/en-us/azure/databricks/external-data/azure-storage) without mounting. – Kombajn zbożowy Jan 10 '23 at 15:36
-
yes. only direct connection should be supported – Alex Ott Jan 10 '23 at 15:37