2

I wanted to connect to Cassandra using Spark, when trying to connect Cassandra using the default port it is working, but when I try accessing it via SSL the job fails, below is the code:

val spark: SparkSession = SparkSession.builder()
.config("spark.cassandra.connection.host","server.abc")
        .config("spark.cassandra.connection.port","9142")
        .config("spark.cassandra.connection.ssl.enabled",true)
        .config("spark.cassandra.connection.ssl.trustStore.path","s3:/dev-code/certs/trust.jks")
        .config("spark.cassandra.connection.ssl.trustStore.password","mypass")
        .config("spark.cassandra.auth.username","myuser")
        .config("spark.cassandra.auth.password","userpass")
        .appName("CassandraIntegration").getOrCreate()

FYI: it has access to the S3 bucket, I am able to read the CSV file from the same location. Also, both the ports are enabled 9042 and 9142. Closed 9042 and kept only 9142 port still the error persists.

Below is the error:

ERROR [main] glue.ProcessLauncher (Logging.scala:logError(94)): Exception in User Class
java.io.IOException: Failed to open native connection to Cassandra at {server.abc:9142} :: Error instantiating class com.datastax.oss.driver.internal.core.ssl.DefaultSslEngineFactory (specified by advanced.ssl-engine-factory.class): Cannot initialize SSL Context
    at com.datastax.spark.connector.cql.CassandraConnector$.createSession(CassandraConnector.scala:173)
    at com.datastax.spark.connector.cql.CassandraConnector$.$anonfun$sessionCache$1(CassandraConnector.scala:161)
    at com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:32)
    at com.datastax.spark.connector.cql.RefCountedCache.syncAcquire(RefCountedCache.scala:69)
    at com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:57)
    at com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:81)
    at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:103)
    at com.datastax.spark.connector.datasource.CassandraCatalog$.com$datastax$spark$connector$datasource$CassandraCatalog$$getMetadata(CassandraCatalog.scala:455)
    at com.datastax.spark.connector.datasource.CassandraCatalog$.getTableMetaData(CassandraCatalog.scala:421)
    at org.apache.spark.sql.cassandra.DefaultSource.getTable(DefaultSource.scala:68)
    at org.apache.spark.sql.cassandra.DefaultSource.inferSchema(DefaultSource.scala:72)
    at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.getTableFromProvider(DataSourceV2Utils.scala:81)
    at org.apache.spark.sql.DataFrameReader.$anonfun$load$1(DataFrameReader.scala:296)
    at scala.Option.map(Option.scala:230)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:266)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:226)
    at MyCsvToCassandrsJob$.main(csv-to-cassanra-job:63)
    at MyCsvToCassandrsJob.main(csv-to-cassanra-job-job)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at com.amazonaws.services.glue.SparkProcessLauncherPlugin.invoke(ProcessLauncher.scala:47)
    at com.amazonaws.services.glue.SparkProcessLauncherPlugin.invoke$(ProcessLauncher.scala:47)
    at com.amazonaws.services.glue.ProcessLauncher$$anon$1.invoke(ProcessLauncher.scala:75)
    at com.amazonaws.services.glue.ProcessLauncher.launch(ProcessLauncher.scala:123)
    at com.amazonaws.services.glue.ProcessLauncher$.main(ProcessLauncher.scala:29)
    at com.amazonaws.services.glue.ProcessLauncher.main(ProcessLauncher.scala)
Caused by: java.lang.IllegalArgumentException: Error instantiating class com.datastax.oss.driver.internal.core.ssl.DefaultSslEngineFactory (specified by advanced.ssl-engine-factory.class): Cannot initialize SSL Context
    at com.datastax.oss.driver.internal.core.util.Reflection.buildFromConfig(Reflection.java:253)
    at com.datastax.oss.driver.internal.core.util.Reflection.buildFromConfig(Reflection.java:108)
    at com.datastax.oss.driver.internal.core.context.DefaultDriverContext.buildSslEngineFactory(DefaultDriverContext.java:414)
    at com.datastax.oss.driver.internal.core.context.DefaultDriverContext.lambda$new$4(DefaultDriverContext.java:279)
    at com.datastax.oss.driver.internal.core.util.concurrent.LazyReference.get(LazyReference.java:55)
    at com.datastax.oss.driver.internal.core.context.DefaultDriverContext.getSslEngineFactory(DefaultDriverContext.java:733)
    at com.datastax.oss.driver.internal.core.context.DefaultDriverContext.buildSslHandlerFactory(DefaultDriverContext.java:470)
    at com.datastax.oss.driver.internal.core.util.concurrent.LazyReference.get(LazyReference.java:55)
    at com.datastax.oss.driver.internal.core.context.DefaultDriverContext.getSslHandlerFactory(DefaultDriverContext.java:799)
    at com.datastax.oss.driver.internal.core.session.DefaultSession$SingleThreaded.init(DefaultSession.java:348)
    at com.datastax.oss.driver.internal.core.session.DefaultSession$SingleThreaded.access$1100(DefaultSession.java:300)
    at com.datastax.oss.driver.internal.core.session.DefaultSession.lambda$init$0(DefaultSession.java:146)
    at com.datastax.oss.driver.shaded.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
    at com.datastax.oss.driver.shaded.netty.util.concurrent.PromiseTask.run(PromiseTask.java:106)
    at com.datastax.oss.driver.shaded.netty.channel.DefaultEventLoop.run(DefaultEventLoop.java:54)
    at com.datastax.oss.driver.shaded.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
    at com.datastax.oss.driver.shaded.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at com.datastax.oss.driver.shaded.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalStateException: Cannot initialize SSL Context
    at com.datastax.oss.driver.internal.core.ssl.DefaultSslEngineFactory.<init>(DefaultSslEngineFactory.java:74)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at com.datastax.oss.driver.internal.core.util.Reflection.buildFromConfig(Reflection.java:246)
    ... 18 more
Caused by: java.nio.file.NoSuchFileException: s3:/dev-code/certs/trust.jks
    at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
    at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
    at java.nio.file.Files.newByteChannel(Files.java:361)
    at java.nio.file.Files.newByteChannel(Files.java:407)
    at java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:384)
    at java.nio.file.Files.newInputStream(Files.java:152)
    at com.datastax.oss.driver.internal.core.ssl.DefaultSslEngineFactory.buildContext(DefaultSslEngineFactory.java:119)
    at com.datastax.oss.driver.internal.core.ssl.DefaultSslEngineFactory.<init>(DefaultSslEngineFactory.java:72)
    ... 23 more

Big help if there is any workaround for this problem.

Anbinson
  • 21
  • 2
  • The problem is that Java driver that is performing the actual connection doesn't know anything about S3 url, and expects the local file path. Theoretically you can specify them via `--files` – Alex Ott Sep 17 '21 at 14:52
  • Thanks for the reply I tried to add a new parameter --extra-files and value as s3://dev-code/certs/trust.jks, but still got the same error Caused by: java.nio.file.NoSuchFileException: /tmp/trust.jks – Anbinson Sep 17 '21 at 15:25

2 Answers2

0

At the bottom of your error message, I see this:

NoSuchFileException: s3:/dev-code/certs/trust.jks

Alex is right, in that you need to provide a path to that file that the Spark connector can actually get to. From the looks of it, S3 won't work here.

Aaron
  • 55,518
  • 11
  • 116
  • 132
  • Hi Aaron, I tried adding, but no luck. I am getting only pyspark references is there any doc for scala for accessing --extra-files value. – Anbinson Sep 17 '21 at 15:39
  • I tried to list out the files present in /tmp and I could see that .jks file is present but still getting the same error – Anbinson Sep 17 '21 at 16:01
  • @Anbinson similar question on pyspark might help. In this case, they did refrence Google's cloud storage, so maybe it can be done. https://stackoverflow.com/questions/34939520/while-submit-job-with-pyspark-how-to-access-static-files-upload-with-files-ar – Aaron Sep 17 '21 at 18:20
0

Added the .jks s3 file into "Referenced files path" of Glue Job and then just try to access just provide the file name. As the file will be automatically be placed under /tmp folder. But it will still not solve the issue.

From the this website, I understood that we need to provide all the default values as well:

Below is my final code:

val spark: SparkSession = SparkSession.builder()
    .config("spark.cassandra.connection.host","server.abc")
    .config("spark.cassandra.connection.port","9142")
    .config("spark.cassandra.connection.ssl.enabled",true)
    .config("spark.cassandra.connection.ssl.enabledAlgorithms", "TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA")
    .config("spark.cassandra.connection.ssl.trustStore.path","trust.jks")
    .config("spark.cassandra.connection.ssl.trustStore.password","mypass")
    .config("spark.cassandra.connection.ssl.trustStore.type","JKS")
    .config("spark.cassandra.connection.ssl.protocol","TLS")
    .config("spark.cassandra.auth.username","myuser")
    .config("spark.cassandra.auth.password","userpass")
    .appName("CassandraIntegration").getOrCreate()
Subhashis Pandey
  • 1,473
  • 1
  • 13
  • 16
Anbinson
  • 21
  • 2