1

I am new to synapse analytics and I want to create a notebook, which could be used further as a databrick in the pipeline in order to process the data in one of the tables from the DB.

I want to do everything in PySpark, so I am wondering how to even read the data from existing DB, using the notebook. NOTE: I dont want to use the method, where I am mentioning my password etc.

spark = SparkSession.builder.appName("SynapseAnalyticsDemo").getOrCreate()

# Set up Synapse Analytics credentials
synapse_servername = "<synapse_servername>.sql.azuresynapse.net"
synapse_database = "<synapse_database>"
synapse_username = "<synapse_username>"
synapse_password = "<synapse_password>"
synapse_jdbc_url = f"jdbc:sqlserver://{synapse_servername}:1433;database={synapse_database};user={synapse_username};password={synapse_password}"

# Define the SQL query and table name
table_name = "<sql_pool_table>"
query = f"SELECT * FROM {table_name} WHERE some_column = 'some_value'"

This is the only method I have found, but taking in account that many people will be able to see the databrick (notebook), I dont want to use this method, where I have to mention the password.

I have also tried this way, but it does not work:

%%pyspark
df=spark.sql ("select * from TableName")
df.show()

In the second method I have written Table Name exactly as it is in SQL Server Management Studio, then I tried to use Azure Synapse dedicated SQL pool name and none of them worked.

How can I access the Azure Synapse dedicated SQL pool using the notebook (PySpark)?

Eli
  • 51
  • 5

1 Answers1

1

You can access dedicated sql pool data in azure synapse notebook without mentioning password by using store the password in azure key vault. For that create Azure key vault and create secret with the password of database.

Retrieve the secret vallue using below code:

from notebookutils import mssparkutils
mssparkutils.credentials.getSecret('AzureKeyvaultName', 'secretName')

Access dedicated sql database using below code:

from notebookutils import mssparkutils

jdbcHostname = "synpool.database.windows.net"

jdbcPort = 1433

jdbcDatabase = "dedsql"

jdbcUsername = "sqladminuser"

jdbcPassword = mssparkutils.credentials.getSecret('sqlservercred', 'sqlpassword')

jdbcDriver = "com.microsoft.sqlserver.jdbc.SQLServerDriver"

#url = s"jdbc:sqlserver://${database_host}:${database_port}/${database_name}"

table = "students"

jdbcUrl = f"jdbc:sqlserver://{jdbcHostname}:{jdbcPort};databaseName={jdbcDatabase}"

df1 = spark.read.format("jdbc").option("driver", jdbcDriver).option("url", jdbcUrl).option("dbtable", table).option("user", jdbcUsername).option("password", jdbcPassword).load()

enter image description here

df1.show()

enter image description here

Chayim Friedman
  • 47,971
  • 5
  • 48
  • 77
Bhavani
  • 1,725
  • 1
  • 3
  • 6
  • Did you added access policies for user in azure key vault? – Bhavani Apr 26 '23 at 14:13
  • Could you please provide the error? – Bhavani Apr 26 '23 at 14:30
  • Sure, so if I put the real password for admin it works perfectly fine (thank you very much), otherwise if I use this line `jdbcPassword = mssparkutils.credentials.getSecret('sqlservercred', 'sqlpassword')`, I get following error: `An error occurred while calling ... Azure Key Vault returned error code 'Unauthorized' with message ...` – Eli Apr 26 '23 at 14:46
  • Did you give the permissions to the user to retrieve secret value in Azure Key Vault by adding access policies to the user? – Bhavani Apr 27 '23 at 07:45
  • Check [this](https://learn.microsoft.com/en-us/azure/key-vault/general/assign-access-policy?tabs=azure-portal) for adding access policies. – Bhavani Apr 27 '23 at 07:46