1

I have been connecting with SQL Server using EMR Serverless App v-6.8.0 for Spark. So, I have tested code in local machine as well as on ec2 but when I ran the code on this serverless cluster I got an error.

Note: My VPC Security Group has enabled all traffic ports

So, this is my submit job,

    applicationId=app_id,
    executionRoleArn="my-role",
    jobDriver={
        "sparkSubmit": {
            "entryPoint": "s3://emr-studio-rts/scripts/ms-sql-fetch.py",
            "entryPointArguments": ["s3://emr-studio-rts/output"],
            "sparkSubmitParameters": "--jars https://emr-studio-rts.s3.us-east-2.amazonaws.com/jars/sqljdbc42.jar --conf spark.executor.cores=1 --conf spark.executor.memory=4g --conf spark.driver.cores=1 --conf spark.driver.memory=4g --conf spark.executor.instances=1",
        }
    },
    configurationOverrides={
        "monitoringConfiguration": {
            "s3MonitoringConfiguration": {"logUri": "s3://emr-studio-rts/logs"}
        }
    },
)

Now I can show the error as, for the line,

spark = SparkSession\
        .builder\
        .appName('test-db') \
        .config('spark.driver.extraClassPath', 'https://emr-studio-rts.s3.us-east-2.amazonaws.com/jars/sqljdbc42.jar') \
        .config('spark.executor.extraClassPath', 'https://emr-studio-rts.s3.us-east-2.amazonaws.com/jars/sqljdbc42.jar') \
        .config("spark.executor.cores", "1") \
        .getOrCreate()

#read table data into a spark dataframe
    df1 = spark.read.format("jdbc") \
        .option("url", f"jdbc:sqlserver://{my_host}:1433;databaseName={my_database};") \
        .option("dbtable", table_name) \
        .option("user", my_user) \
        .option("password", my_password) \
        .option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver") \
        .load()

as follows,

Status Details: Job failed, please check complete logs in configured logging destination. ExitCode: 1. Last few exceptions: : com.microsoft.sqlserver.jdbc.SQLServerException: The TCP/IP connection to the host 3.12.0.70, port 1433 has failed. Error: "Connection timed out: no further information. Verify the connection properties. Make sure that an instance of SQL Server is running on the host and accepting TCP/IP connections at the port. Make sure that TCP connections to the port are not blocked by a firewall.". py4j.protocol.Py4JJavaError: An error occurred while calling o93.load.

Dale K
  • 25,246
  • 15
  • 42
  • 71
  • Double check the egress rules on your security groups. – AlwaysLearning Nov 03 '22 at 22:42
  • can you explain a little @AlwaysLearning – Muhammad Ashir Ali Nov 03 '22 at 22:53
  • I can telnet to 3.12.0.70 on port 1433, so that port is open to the entire internet. Check which security groups are applied to your serverless environment, they must be missing an egress rule that allows tcp/1433 traffic to that address. – AlwaysLearning Nov 03 '22 at 23:46
  • In the configuration of EMR Serverless, its always taking a private subnet under VPC otherise throwing error ```Failed to configure application 'spark-streams (00f59qjqo97u2t0d)s' All subnets must be private```. So its not connected to outer internet, i don't understanding the point then how to connect it. – Muhammad Ashir Ali Nov 04 '22 at 06:23
  • 1
    I have corrected this issue by understanding that i have to give private subnet to configuration and then i have to map this private subnet to NAT gateway towards public subnet and then the public subnet to internet gateway. So, its actually an issue with the VPC. Thanks everyone for help. – Muhammad Ashir Ali Nov 10 '22 at 08:05

0 Answers0