I have been connecting with SQL Server using EMR Serverless App v-6.8.0 for Spark. So, I have tested code in local machine as well as on ec2 but when I ran the code on this serverless cluster I got an error.
Note: My VPC Security Group has enabled all traffic ports
So, this is my submit job,
applicationId=app_id,
executionRoleArn="my-role",
jobDriver={
"sparkSubmit": {
"entryPoint": "s3://emr-studio-rts/scripts/ms-sql-fetch.py",
"entryPointArguments": ["s3://emr-studio-rts/output"],
"sparkSubmitParameters": "--jars https://emr-studio-rts.s3.us-east-2.amazonaws.com/jars/sqljdbc42.jar --conf spark.executor.cores=1 --conf spark.executor.memory=4g --conf spark.driver.cores=1 --conf spark.driver.memory=4g --conf spark.executor.instances=1",
}
},
configurationOverrides={
"monitoringConfiguration": {
"s3MonitoringConfiguration": {"logUri": "s3://emr-studio-rts/logs"}
}
},
)
Now I can show the error as, for the line,
spark = SparkSession\
.builder\
.appName('test-db') \
.config('spark.driver.extraClassPath', 'https://emr-studio-rts.s3.us-east-2.amazonaws.com/jars/sqljdbc42.jar') \
.config('spark.executor.extraClassPath', 'https://emr-studio-rts.s3.us-east-2.amazonaws.com/jars/sqljdbc42.jar') \
.config("spark.executor.cores", "1") \
.getOrCreate()
#read table data into a spark dataframe
df1 = spark.read.format("jdbc") \
.option("url", f"jdbc:sqlserver://{my_host}:1433;databaseName={my_database};") \
.option("dbtable", table_name) \
.option("user", my_user) \
.option("password", my_password) \
.option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver") \
.load()
as follows,
Status Details: Job failed, please check complete logs in configured logging destination. ExitCode: 1. Last few exceptions: : com.microsoft.sqlserver.jdbc.SQLServerException: The TCP/IP connection to the host 3.12.0.70, port 1433 has failed. Error: "Connection timed out: no further information. Verify the connection properties. Make sure that an instance of SQL Server is running on the host and accepting TCP/IP connections at the port. Make sure that TCP connections to the port are not blocked by a firewall.". py4j.protocol.Py4JJavaError: An error occurred while calling o93.load.