Kafka Spark streaming HBase insert issues

Question

I'm using Kafka to send a file with 3 columns using Spark streaming 1.3 to insert into HBase. This is how my HBase looks like :

ROW                      COLUMN+CELL
 zone:bizert             column=travail:call, timestamp=1491836364921, value=contact:numero
 zone:jendouba           column=travail:Big data, timestamp=1491835836290, value=contact:email
 zone:tunis              column=travail:info, timestamp=1491835897342, value=contact:num
3 row(s) in 0.4200 seconds

And this is how I read data with spark streaming, I'm using spark-shell:

import org.apache.spark.streaming.{ Seconds, StreamingContext }
import org.apache.spark.streaming.kafka.KafkaUtils
import kafka.serializer.StringDecoder
 val ssc = new StreamingContext(sc, Seconds(10))
 val topicSet = Set ("zed")
 val kafkaParams = Map[String, String]("metadata.broker.list" -> "xx.xx.xxx.xx:9092")
 val stream = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](ssc, kafkaParams, topicSet)
 lines.foreachRDD(rdd => { (!rdd.partitions.isEmpty)
 lines.saveAsTextFiles("hdfs://xxxxx:8020/user/admin/zed/steams3/")
})

this code is working when I'm saving data into HDFS even it save many empty data to HDFS. before writing this question I was searching here and some other question like mine but I didn't get a good solution.

May you propose the best way to do this?. This is how my code look now

val sc = new SparkContext("local", "Hbase spark")
val tableName = "notz"
    val conf = HBaseConfiguration.create()
    conf.addResource(new Path("file:///opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/etc/hbase/conf.dist/hbase-site.xml"))
    conf.set(TableInputFormat.INPUT_TABLE, tableName)
    val admin = new HBaseAdmin(conf)
lines.foreachRDD(rdd => { (!rdd.partitions.isEmpty)
if(!admin.isTableAvailable(tableName)) {

      print("Creating GHbase Table")
      val tableDesc = new HTableDescriptor(tableName)
      tableDesc.addFamily(new HColumnDescriptor("zone"
                                    .getBytes()))

      admin.createTable(tableDesc)

    }else{
      print("Table already exists!!")
    }
val myTable = new HTable(conf, tableName)

// i'm blocked here
    })

Kafka Spark streaming HBase insert issues

0 Answers0