0

I am trying to create hbase connection in MapPartitionFunction of spark.

Caused by: java.io.NotSerializableException: org.apache.hadoop.conf.Configuration

I tried the following code

SparkConf conf = new SparkConf()
            .setAppName("EnterPrise Risk Score")
            .setMaster("local");
    conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer");
    conf.set("spark.kryo.registrationRequired", "true");
    conf.registerKryoClasses(new Class<?>[] {
                            Class.forName("org.apache.hadoop.conf.Configuration"),
                            Class.forName("org.apache.hadoop.hbase.client.Table"),
                            Class.forName("com.databricks.spark.avro.DefaultSource$SerializableConfiguration")});
    SparkSession sparkSession = SparkSession.builder().config(conf)
            .getOrCreate();
Configuration hbaseConf= HBaseConfiguration
            .create(hadoopConf);

I am using sparkSession to create dataset and pass hbaseConf to create connections to hbase.

Is there any way to connect to hbase?

Ram Ghadiyaram
  • 28,239
  • 13
  • 95
  • 121
Pradeep
  • 850
  • 2
  • 14
  • 27

1 Answers1

1

You probably implicitly pass an HBase configuration to a spark action like this:

Configuration hbaseConfiguration = HBaseConfiguration.create();
sc.hadoopFile(inDirTrails, AvroInputFormat.class, AvroWrapper.class, NullWritable.class)).mapPartitions( i -> {
    Connection connection = ConnectionFactory.createConnection(hbaseConfiguration)
    //more valid code
});

Why don't you just create Configuration right inside of it like this:

sc.hadoopFile(inDirTrails, AvroInputFormat.class, AvroWrapper.class, NullWritable.class)).mapPartitions( i -> {
    Configuration hbaseConfiguration = HBaseConfiguration.create();
    hbaseConfiguration.set("hbase.zookeeper.quorum", HBASE_ZOOKEEPER_QUORUM);
    Connection connection = ConnectionFactory.createConnection(hbaseConfiguration)
    //more valid code
});
MaxNevermind
  • 2,784
  • 2
  • 25
  • 31
  • [mapPartitions method](http://stackoverflow.com/questions/21185092/apache-spark-map-vs-mappartitions) is good for creating connections or objects – Ram Ghadiyaram Sep 03 '16 at 09:50