0

I've been working with Cassandra for a little while and now I'm trying to setup spark and spark-cassandra-connector. I'm using IntelliJ IDEA to do that (first time with IntelliJ IDEA and Scala too) in Windows 10.

build.gradle

apply plugin: 'scala'
apply plugin: 'idea'
apply plugin: 'eclipse'

repositories {
    mavenCentral()

    flatDir {
        dirs 'runtime libs'
    }
}

idea {
    project {
        jdkName = '1.8'
        languageLevel = '1.8'
    }
}

dependencies {
    compile group: 'org.apache.spark', name: 'spark-core_2.11', version: '2.4.5'
    compile group: 'org.apache.spark', name: 'spark-sql_2.11', version: '2.4.5'
    compile group: 'org.scala-lang', name: 'scala-library', version: '2.11.12'
    compile group: 'com.datastax.spark', name: 'spark-cassandra-connector_2.11', version: '2.5.0'
    compile group: 'log4j', name: 'log4j', version: '1.2.17'
}

configurations.all {
    resolutionStrategy {
        force 'com.google.guava:guava:12.0.1'
    }
}

compileScala.targetCompatibility = "1.8"
compileScala.sourceCompatibility = "1.8"

jar {
    zip64 true
    archiveName = "ModuleName.jar"
    from {
        configurations.compile.collect {
            it.isDirectory() ? it : zipTree(it)
        }
    }
    manifest {
        attributes 'Main-Class': 'org.module.ModuelName'
    }
    exclude 'META-INF/*.RSA', 'META-INF/*.SF', 'META-INF/*.DSA'

}

ModuleName.scala

package org.module
import org.apache.spark.sql.SparkSession
import com.datastax.spark.connector._
import org.apache.spark.sql.types.TimestampType

object SentinelSparkModule {

  case class Document(id: Int, time: TimestampType, data: String)

  def main(args: Array[String]) {
    val spark = SparkSession.builder
      .master("spark://192.168.0.3:7077")
      .appName("App")
      .config("spark.cassandra.connection.host", "127.0.0.1")
      .config("spark.cassandra.connection.port", "9042")
      .getOrCreate()

    //I'm trying it without [Document] since it throws 'Failed to map constructor parameter id in
    //org.module.ModuleName.Document to a column of keyspace.table'

    val documentRDD = spark.sparkContext
      .cassandraTable/*[Document]*/("keyspace", "table")
      .select()
    documentRDD.take(10).foreach(println)
    spark.stop()
 }
}

I have a running spark master at spark://192.168.0.3:7077 and a worker of that master, but I haven't tried to submit the job as a compiled jar in the console, I'm just trying to get it to work in the IDE.

Thanks

Scaramouche
  • 3,188
  • 2
  • 20
  • 46

1 Answers1

1

Cassandra connector jar needs to be added to the classpath of workers. One way to do this is to build an uber jar with all required dependencies and submit to the cluster.

Refer to: Building a uberjar with Gradle

Also, make sure you change the scope of dependencies in you build file from compile to provided for all jars except the cassandra connector.

Reference: https://reflectoring.io/maven-scopes-gradle-configurations/

Aravind Yarram
  • 78,777
  • 46
  • 231
  • 327
  • that question asks about SBT. do I have to use SBT instead of Gradle then? and the accepted answer ditches Intellij altogether, which I'd like to avoid – Scaramouche May 02 '20 at 19:38
  • @Scaramouche If you want to just run the app from IntelliJ then you should install the cassandra libs in spark worker nodes as a one time thing. Refer to this: https://stackoverflow.com/a/36879404/127320 – Aravind Yarram May 02 '20 at 19:43