7

Here are my specs:

  • Casssandra version: 3.0.0
  • Operating System: Mac OSX Yosemite 10.10.5
  • Spark Version: 1.4.1

Context:

I have created a keyspace "movies" and a table "movieinfo in Cassandra. I have installed and assembled a jar file following the guidance from this post. I have written a small script(below) to test my connection:

scala> sc.stop

scala> import com.datastax.spark.connector._
import com.datastax.spark.connector._

scala> import org.apache.spark.SparkConf
import org.apache.spark.SparkConf

scala> import org.apache.spark.SparkContext._
import org.apache.spark.SparkContext._

scala> import org.apache.spark.SparkContext
import org.apache.spark.SparkContext

scala> val conf = new SparkConf()
conf: org.apache.spark.SparkConf = org.apache.spark.SparkConf@2ae92511

scala> conf.set("cassandra.connection.host", "127.0.0.1")
res1: org.apache.spark.SparkConf = org.apache.spark.SparkConf@2ae92511

scala> val sc = new SparkContext("local[*]", "Cassandra Test", conf)
sc: org.apache.spark.SparkContext = org.apache.spark.SparkContext@59b5251d

scala> val table = sc.cassandraTable("movies", "movieinfo")
table: com.datastax.spark.connector.rdd.CassandraTableScanRDD[com.datastax.spark.connector.CassandraRow] = CassandraTableScanRDD[0] at RDD at CassandraRDD.scala:15

scala> table.count

However, I receive the proceeding tracelog.

15/11/24 09:21:30 WARN NettyUtil: Found Netty's native epoll transport, but not running on linux-based operating system. Using NIO instead.
java.io.IOException: Failed to open native connection to Cassandra at {10.223.134.106}:9042
    at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:164)
    at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:150)
    at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:150)
    at com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:31)
    at com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:56)
    at com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:81)
    at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:109)
    at com.datastax.spark.connector.cql.CassandraConnector.withClusterDo(CassandraConnector.scala:120)
    at com.datastax.spark.connector.cql.Schema$.fromCassandra(Schema.scala:249)
    at com.datastax.spark.connector.rdd.CassandraTableRowReaderProvider$class.tableDef(CassandraTableRowReaderProvider.scala:51)
    at com.datastax.spark.connector.rdd.CassandraTableScanRDD.tableDef$lzycompute(CassandraTableScanRDD.scala:59)
    at com.datastax.spark.connector.rdd.CassandraTableScanRDD.tableDef(CassandraTableScanRDD.scala:59)
    at com.datastax.spark.connector.rdd.CassandraTableRowReaderProvider$class.verify(CassandraTableRowReaderProvider.scala:146)
    at com.datastax.spark.connector.rdd.CassandraTableScanRDD.verify(CassandraTableScanRDD.scala:59)
    at com.datastax.spark.connector.rdd.CassandraTableScanRDD.getPartitions(CassandraTableScanRDD.scala:143)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
    at scala.Option.getOrElse(Option.scala:120)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1781)
    at org.apache.spark.rdd.RDD.count(RDD.scala:1099)
    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:34)
    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:39)
    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:41)
    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:43)
    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:45)
    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:47)
    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:49)
    at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:51)
    at $iwC$$iwC$$iwC$$iwC.<init>(<console>:53)
    at $iwC$$iwC$$iwC.<init>(<console>:55)
    at $iwC$$iwC.<init>(<console>:57)
    at $iwC.<init>(<console>:59)
    at <init>(<console>:61)
    at .<init>(<console>:65)
    at .<clinit>(<console>)
    at .<init>(<console>:7)
    at .<clinit>(<console>)
    at $print(<console>)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
    at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338)
    at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
    at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
    at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
    at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
    at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
    at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
    at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
    at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
    at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
    at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
    at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
    at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
    at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
    at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
    at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
    at org.apache.spark.repl.Main$.main(Main.scala:31)
    at org.apache.spark.repl.Main.main(Main.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:665)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:170)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:193)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /10.223.134.106:9042 (com.datastax.driver.core.TransportException: [/10.223.134.106:9042] Cannot connect))
    at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:220)
    at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:79)
    at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1393)
    at com.datastax.driver.core.Cluster.getMetadata(Cluster.java:402)
    at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:157)
    ... 70 more

I believe that I may need to modify the settings and dependencies in the pom.xml file (citing this post). However, as I am a novice in both Spark and Java I would appreciate any guidance or feedback on how to best proceed. Thank you for your support.

Community
  • 1
  • 1
ahlusar1989
  • 349
  • 5
  • 16

2 Answers2

3

This warning comes from the Java driver. What it tells you is that it has found in your classpath Netty's native transport, but this feature is only available under Linux, while you are running on Mac OS X.

If you are using Maven, check your dependencies to see if you are manually (or transitively) including this dependency:

    <dependency>
      <groupId>io.netty</groupId>
      <artifactId>netty-transport-native-epoll</artifactId>
      <version>...</version>
    </dependency>

If so, simply remove it. Otherwise, it is safe to disregard this warning.

luk2302
  • 55,258
  • 23
  • 97
  • 137
adutra
  • 4,231
  • 1
  • 20
  • 18
  • Within the pom.xml there is also this alternate dependency generator: https://gist.github.com/anonymous/558da59f48650f843851. In addition, how am I able to stifle this warning? I am running spark through terminal. Thank you for your help. – ahlusar1989 Nov 26 '15 at 13:31
  • 1
    Can you post your own pom.xml file? This could help people understand better your problem. Your gist quotes the driver's pom.xml and is about generating a jar with shaded Netty: see [our documentation](https://datastax.github.io/java-driver/2.1.9/features/shaded_jar) for more on this. If you use the shaded driver jar, then you should manually exclude the non-shaded Netty dependency. And finally, to turn off this warning, set the following logger level to `ERROR` or `OFF`: `com.datastax.driver.core.NettyUtil`. – adutra Nov 27 '15 at 11:43
  • Hmm sorry, I did not read correctly your stack trace, so please disregard my answer. The first line is indeed a warning about Netty's native transport, but the rest is unrelated and probably due to a misconfiguration in the driver: it is trying to connect to 10.223.134.106 on port 9042, but there is no process listening. Did you start a Cassandra node on this machine/port? Can you connect to it through cqlsh? – adutra Nov 27 '15 at 11:52
  • This is the default when I enter cqlsh through the terminal: Connected to Test Cluster at 127.0.0.1:9042. [cqlsh 5.0.1 | Cassandra 3.0.0 | CQL spec 3.3.1 | Native protocol v4]; however, I specifically declare 127.0.0.1 as the connection host (above). – ahlusar1989 Nov 27 '15 at 12:06
  • 1
    Oh I see; there might be an error in your command; the right command should be: `conf.set("spark.cassandra.connection.host", "127.0.0.1")`. – adutra Nov 27 '15 at 12:15
  • Ah- I will try that. Why does one need the additional "spark"? – ahlusar1989 Nov 27 '15 at 12:23
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/96352/discussion-between-adutra-and-ahlusar1989). – adutra Nov 27 '15 at 12:29
0

Encountered the same problem on Mac OS running spark-shell. The solution was to pass --conf spark.driver.host=localhost to spark-shell.

For some reason on Mac OS the local ip of the computer is assigned to spark.driver.host by default.

Opening http://<local ip>:4040 in the browser does not work but opening http://localhost:4040 works.