1

I am trying to visualise data from spark in Kibana. However, creating an RRD by using the following:

    val test = sc.cassandraTable("test","data")

Then I used the Elasticsearch and Hadoop library to stream to Elasticsearch with following:

    EsSpark.saveToEs(test, "spark/docs", Map("es.nodes" -> "192.168.1.88"))

but i get this error:

15/04/20 16:15:27 ERROR TaskSetManager: Task 0 in stage 12.0 failed 4 times; aborting job
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 12.0 failed 4 times, most recent failure: Lost task 0.3 in stage 12.0 (TID 36, 192.168.1.92): org.elasticsearch.hadoop.serialization.EsHadoopSerializationException: Cannot handle type [class com.datastax.spark.connector.CassandraRow]

Could anyone guide me to streaming from spark to Elasticsearch. Is there any better way to visualize data from cassandra, solr or spark. I came across banana but it does not seem to have option for publishing dashabords.

Thanks

kmakma
  • 73
  • 9

1 Answers1

1

According to Spark Cassandra Connector Guide, you can first define a case class, then convert the CassandraRow to the case class objects, then save the objects to Elasticsearch. Below is the sample code from the guide:

case class WordCount(w: String, c: Int)

object WordCount { 
    implicit object Mapper extends DefaultColumnMapper[WordCount](
        Map("w" -> "word", "c" -> "count")) 
}

sc.cassandraTable[WordCount]("test", "words").toArray
// Array(WordCount(bar,20), WordCount(foo,10))

sc.parallelize(Seq(WordCount("baz", 30), WordCount("foobar", 40)))
  .saveToCassandra("test", "words", SomeColumns("word", "count"))
aaskey
  • 131
  • 1
  • 6