5

While checking how to use the cassandra connection, the documentation instructs to add this to the sbt file:

"libraryDependencies += "com.datastax.spark" %% "spark-cassandra-connector" % "1.6.0-M1"

In general, is there an obvious, straight forward logic to translate this into the corresponding:

spark-shell --packages "field1":"field2"

I've tried:

spark-shell --packages "com.datastax.spark":"spark-cassandra-connector"

and a few other things but that doesn't work.

elelias
  • 4,552
  • 5
  • 30
  • 45
  • 1
    check this question: http://stackoverflow.com/questions/25837436/how-to-load-spark-cassandra-connector-in-the-shell – drstein Mar 18 '16 at 09:35
  • yeah, saw that. It solves the issue of having the Cassandra connector on the shell, but I'm more interested in the general case, whether there is a logic there somehow – elelias Mar 18 '16 at 09:40

3 Answers3

5

I believe it is --packages "groupId:artifactId:version". If you have multiple packages, you can comma separate them. --packages "groupId1:artifactId1:version1, groupId2:artifactId2:version2"

In sbt

val appDependencies = Seq(
  "com.datastax.spark" % "spark-cassandra-connector_2.10" % "1.6.0-M1"
)

and

val appDependencies = Seq(
  "com.datastax.spark" %% "spark-cassandra-connector" % "1.6.0-M1"
)

are identical. In case you use %% syntax (after the groupId) in sbt, it automatically picks up the artifact for your scala version. So using scala 2.10 it changes your spark-cassandra-connector to spark-cassandra-connector_2.10. Not sure this feature is there when using spark-shell, so you might need to ask for the scala2_10 version of your artifact explicitly like this: --packages "com.datastax.spark:spark-cassandra-connector_2.10:1.6.0-M1"

Daniel B.
  • 929
  • 4
  • 8
3

Version should be specified.

spark-shell --packages "com.datastax.spark":"spark-cassandra-connector_2.11":"2.0.0-M3"

You can find version information from http://search.maven.org/#search%7Cga%7C1%7Cspark-cassandra-connector .

MyounghoonKim
  • 1,030
  • 16
  • 18
1

Follow the instructions as posted on the Spark Packages Website

To use the Spark-Shell

$SPARK_HOME/bin/spark-shell --packages datastax:spark-cassandra-connector:1.6.0-M1-s_2.10

There are also instructions for a variety of build systems

SBT

resolvers += "Spark Packages Repo" at "http://dl.bintray.com/spark-packages/maven"

libraryDependencies += "datastax" % "spark-cassandra-connector" % "1.6.0-M1-s_2.11"

And Maven

<dependencies>
  <!-- list of dependencies -->
  <dependency>
    <groupId>datastax</groupId>
    <artifactId>spark-cassandra-connector</artifactId>
    <version>1.6.0-M1-s_2.11</version>
  </dependency>
</dependencies>
<repositories>
  <!-- list of other repositories -->
  <repository>
    <id>SparkPackagesRepo</id>
    <url>http://dl.bintray.com/spark-packages/maven</url>
  </repository>
</repositories>
RussS
  • 16,476
  • 1
  • 34
  • 62