0

I found a similar post but it didn't help.

I've been working with Cassandra for a little while and now I'm trying to setup spark and spark-cassandra-connector. I'm using IntelliJ IDEA to do that (first time with IntelliJ IDEA and Scala too so, you get the picture)

My OS is Windows 10. This is what I've done:

Inside ../spark-2.4.5-bin-hadoop2.7/bin: spark-class.cmd org.apache.spark.deploy.master.Master

Inside ../spark-2.4.5-bin-hadoop2.7/bin: spark-class.cmd org.apache.spark.deploy.worker.Worker -c 1 spark://192.168.0.3:7077

build.gradle

apply plugin: 'scala'
apply plugin: 'idea'
apply plugin: 'eclipse'

repositories {
    mavenCentral()
}

idea {
    project {
        jdkName = '1.8'
        languageLevel = '1.8'
    }
}

dependencies {
    compile group: 'org.apache.spark', name: 'spark-core_2.11', version: '2.4.5'
    compile group: 'org.apache.spark', name: 'spark-sql_2.11', version: '2.4.5'
    compile group: 'org.scala-lang', name: 'scala-library', version: '2.11.11'
    compile group: 'com.datastax.spark', name: 'spark-cassandra-connector_2.11', version: '2.4.0'
}

configurations.all {
    resolutionStrategy {
        force 'com.google.guava:guava:12.0.1'
    }
}

compileScala.targetCompatibility = "1.8"
compileScala.sourceCompatibility = "1.8"

SparkModule.scala

package org.sentinel.spark_module

import org.apache.spark.{SparkConf, SparkContext}
import com.datastax.spark.connector._

object SentinelSparkModule {
  def main(args: Array[String]) {
    val conf = new SparkConf().set("spark.cassandra.connection.host", "127.0.0.1")
      .set("spark.cassandra.connection.port", "9042")
      .setAppName("Sentinel").setMaster("spark://192.168.0.3:7077")

    val sc = new SparkContext(conf)
    val rdd = sc.cassandraTable("keyspace", "table")
    val values = rdd.groupBy((CassandraRow) => {
      @throws[Exception]
      def call(row: Nothing) = CassandraRow.getString("column")
    }).take(10).foreach(println)    
  }
}

Even though the error occurs, I can still see the app running in http://localhost:8080/ until I stop the execution in the IDE. enter image description here

Excerpt of the full stack dump:

Exception in thread "main" java.io.IOException: Failed to open native connection to Cassandra at {127.0.0.1}:9042

Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /127.0.0.1:9042 (com.datastax.driver.core.exceptions.OperationTimedOutException: [/127.0.0.1:9042] Operation timed out))

Finally, even though it says it timed out, I am also querying Cassandra from my web app (node.js) as I'm coding this and the queries work fine. So, I don't know why it'd be a problem on Cassandra's part but, I guess it could be.

Thanks

EDIT:

I included compile group: 'com.datastax.cassandra', name: 'cassandra-driver-core', version: '3.0.0' and same error. (version compatibility table)

EDIT:

nodetool status shows:

Datacenter: datacenter1
========================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load        Tokens       Owns (effective)  Host ID                               Rack
UN  127.0.0.1  138.59 MiB  256          100.0%            77d808e6-5c57-494a-b6fb-7e73593dbb46  rack1

EDIT:

cqlsh 127.0.0.1 9042 shows:

WARNING: console codepage must be set to cp65001 to support utf-8 encoding on Windows platforms.
If you experience encoding problems, change your console codepage with 'chcp 65001' before starting cqlsh.

Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.11.4 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
WARNING: pyreadline dependency missing.  Install to enable tab completion.
cqlsh>
Scaramouche
  • 3,188
  • 2
  • 20
  • 46
  • you shouldn't include cassandra driver explicitly - i'ts inside connector – Alex Ott May 02 '20 at 07:58
  • can you do `nodetool status` from your Cassandra cluster – Alex Ott May 02 '20 at 07:59
  • @AlexOtt *you shouldn't include cassandra driver explicitly - i'ts inside connector*. do you mean I should remove `compile group: 'com.datastax.cassandra', name: 'cassandra-driver-core', version: '3.0.0'`? also, I included `nodetool status`'s output. thanks – Scaramouche May 02 '20 at 15:42
  • yes, you need to remove this dependency - everything is in the connector – Alex Ott May 02 '20 at 17:15
  • can you also try to do `cqlsh 127.0.0.1 9042` ? – Alex Ott May 02 '20 at 17:23
  • @AlexOtt included `cqlsh 127.0.0.1 9042`'s output, although I have been able to use cql queries from the console all along – Scaramouche May 02 '20 at 17:31
  • something is strange, I'm not sure if it's the problem of Windows, or not. Let's try via `spark-shell`: execute from spark directory: `bin\spark-shell.cmd --packages com.datastax.spark:spark-cassandra-connector_2.11:2.5.0`. And then inside enter: `import com.datastax.spark.connector._`, then `val rdd = sc.cassandraTable("keyspace", "table")` and then `rdd.count` – Alex Ott May 02 '20 at 18:04
  • @AlexOtt I think it all went well. after `rdd.count` there was a lot of logging, and finally output `res0: Long = 6646284` which I assume is the number of rows in the table – Scaramouche May 02 '20 at 18:21
  • Yes. It looks like. Then we can narrow it to something with Idea setup... I would say that it could be something with firewall or something like... – Alex Ott May 02 '20 at 18:27
  • @AlexOtt well, this is really odd, the error **java.io.IOException: Failed to open native connection to Cassandra at 127.0.0.1:9042** is not showing anymore. could it be simply network congestion or cassandra misconfiguration? taking into account it was being caused by **OperationTimedOutException**? instead now I'm getting **java.lang.ClassNotFoundException: com.datastax.spark.connector.rdd.partitioner.CassandraPartition** even after adding *spark-cassandra-connector_2.12-2.4.3* to my runtime libs. appreciate your effort and time. thanks – Scaramouche May 02 '20 at 18:45
  • You use SCC for wrong Scala version – Alex Ott May 02 '20 at 19:05
  • I am using now the one in maven's repository and using the versions according to [this table](https://github.com/datastax/spark-cassandra-connector#version-compatibility) but same error. I just posted [a question about it](https://stackoverflow.com/q/61565049/4770813), if you could answer that one and provide more detail about the versions. please, thanks – Scaramouche May 02 '20 at 19:21

1 Answers1

0

Is Cassandra also running on 192.168.0.3? Did you try changing spark.cassandra.connection.host to 192.168.0.3 instead? The reason you are seeing that error is because your Spark executor cannot connect to Cassandra at 127.0.0.1. I don't know anything about your setup, and you might have tried this already, but it could be that the solution is as simple as that.

user1516867
  • 311
  • 2
  • 4
  • Cassandra is running on **127.0.0.1:9042**, at least that's the one I use in my node-cassandra module config, which I use to query cassandra from node without problems. isn't **127.0.0.1:9042** then the host and port I should use from spark as well? – Scaramouche May 02 '20 at 06:36
  • Where is your Spark master running? Is it on the same machine as where your Cassandra nodes are running? – user1516867 May 02 '20 at 07:42
  • yes, I'm working localhost on the whole project so everything is in the same computer. I have 1 master (**spark://192.168.0.3:7077**) and one worker in spark, and I have cassandra on **127.0.0.1:9042** – Scaramouche May 02 '20 at 15:23
  • As the other comments above show, one of the possible reasons for the error is a dependency issue. The Cassandra driver [uses Guava 16.0.1](https://github.com/datastax/spark-cassandra-connector/blob/f52c091de3ad001102c85c334faf18beaf937deb/project/Versions.scala#L33) so you could try using that instead. For what it is worth, you can also try upgrading the Cassandra Spark Connector to version 2.5.0 (dont use 3.0 as this is not compatible with Spark 2.4). – user1516867 May 03 '20 at 01:56