1

I'm working on a custom load function to load data from Bigtable using Pig on Dataproc. I compile my java code using the following list of jar files I grabbed from Dataproc. When I run the following Pig script, it fails when it tries to establish a connection with Bigtable.

Error message is:

Bigtable does not support managed connections.

Questions:

  1. Is there a work around for this problem?
  2. Is this a known issue and is there a plan to fix or adjust?
  3. Is there a different way of implementing multi scans as a load function for Pig that will work with Bigtable?

Details:

Jar files:

hadoop-common-2.7.3.jar 
hbase-client-1.2.2.jar
hbase-common-1.2.2.jar
hbase-protocol-1.2.2.jar
hbase-server-1.2.2.jar
pig-0.16.0-core-h2.jar

Here's a simple Pig script using my custom load function:

%default gte         '2017-03-23T18:00Z'
%default lt          '2017-03-23T18:05Z'
%default SHARD_FIRST '00'
%default SHARD_LAST  '25'
%default GTE_SHARD   '$gte\_$SHARD_FIRST'
%default LT_SHARD    '$lt\_$SHARD_LAST'
raw = LOAD 'hbase://events_sessions'
      USING com.eduboom.pig.load.HBaseMultiScanLoader('$GTE_SHARD', '$LT_SHARD', 'event:*')
      AS (es_key:chararray, event_array);
DUMP raw;

My custom load function HBaseMultiScanLoader creates a list of Scan objects to perform multiple scans on different ranges of data in the table events_sessions determined by the time range between gte and lt and sharded by SHARD_FIRST through SHARD_LAST.

HBaseMultiScanLoader extends org.apache.pig.LoadFunc so it can be used in the Pig script as load function. When Pig runs my script, it calls LoadFunc.getInputFormat(). My implementation of getInputFormat() returns an instance of my custom class MultiScanTableInputFormat which extends org.apache.hadoop.mapreduce.InputFormat. MultiScanTableInputFormat initializes org.apache.hadoop.hbase.client.HTable object to initialize the connection to the table.

Digging into the hbase-client source code, I see that org.apache.hadoop.hbase.client.ConnectionManager.getConnectionInternal() calls org.apache.hadoop.hbase.client.ConnectionManager.createConnection() with the attribute “managed” hardcoded to “true”. You can see from the stack track below that my code (MultiScanTableInputFormat) tries to initialize an HTable object which invokes getConnectionInternal() which does not provide an option to set managed to false. Going down the stack trace, you will get to AbstractBigtableConnection that will not accept managed=true and therefore cause the connection to Bigtable to fail.

Here’s the stack trace showing the error:

2017-03-24 23:06:44,890 [JobControl] ERROR com.turner.hbase.mapreduce.MultiScanTableInputFormat - java.io.IOException: java.lang.reflect.InvocationTargetException
    at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:240)
    at org.apache.hadoop.hbase.client.ConnectionManager.createConnection(ConnectionManager.java:431)
    at org.apache.hadoop.hbase.client.ConnectionManager.createConnection(ConnectionManager.java:424)
    at org.apache.hadoop.hbase.client.ConnectionManager.getConnectionInternal(ConnectionManager.java:302)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:185)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:151)
    at com.eduboom.hbase.mapreduce.MultiScanTableInputFormat.setConf(Unknown Source)
    at com.eduboom.pig.load.HBaseMultiScanLoader.getInputFormat(Unknown Source)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:264)
    at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301)
    at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
    at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
    at org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:194)
    at java.lang.Thread.run(Thread.java:745)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:276)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:238)
    ... 26 more
Caused by: java.lang.IllegalArgumentException: Bigtable does not support managed connections.
    at org.apache.hadoop.hbase.client.AbstractBigtableConnection.<init>(AbstractBigtableConnection.java:123)
    at com.google.cloud.bigtable.hbase1_2.BigtableConnection.<init>(BigtableConnection.java:55)
    ... 31 more
EduBoom
  • 147
  • 1
  • 8
  • It looks like MultiScanTableInputFormat.setConf is invoking the deprecated HTable(Configuration) constructor. Instead, call ConnectionFactory#createConnection(Configuration) and then Connection#getTable(TableName) to retrieve a Table instance. The following doc has changes from earlier HBase versions to Cloud Bigtable / HBase 1.2: https://cloud.google.com/bigtable/docs/hbase-api-changes – Angus Davis Mar 25 '17 at 01:31
  • Thanks for the input. I'm working on adjusting the code to use the newer packages. – EduBoom Mar 27 '17 at 19:08
  • I'm using the newer libraries now, but it fails getting splits: job_1489760783427_0051 raw MAP_ONLY Message: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Can't get the locations at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:279) ... Caused by: ... at org.apache.hadoop.hbase.client.HRegionLocator.getStartEndKeys(HRegionLocator.java:122) at com.turner.hbase.mapreduce.MultiScanTableInputFormat.getSplits(Unknown Source) I'll keep digging, but if you know what's amiss, please share. – EduBoom Mar 28 '17 at 01:07
  • I'm trying to understand why it's trying to connect to ZooKeeper in the localhost. Connection refused, obviously. It should be somewhere in BigTable. 2017-03-28 00:50:00,993 [JobControl] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=localhost:2181 sessionTimeout=180000 watc her=hconnection-0x38d443990x0, quorum=localhost:2181, baseZNode=/hbase – EduBoom Mar 28 '17 at 01:13
  • You're absolutely correct that there should be no ZK interactions. Often when I see things like this in my own experiments, it's a sign that the connection impl property isn't being propagated or specified properly (or something is loading a different Configuration object, etc) – Angus Davis Mar 28 '17 at 19:57
  • I'm marking this question as answered because even though I did not get past the issue related to ZK, the original question was answered. The problem was caused by the use of deprecated hbase client classes. I'll start another question related to the ZK issue if I can't solve it soon. – EduBoom Mar 29 '17 at 18:27

2 Answers2

1

The original problem was caused by the use of outdated and deprecated hbase client jars and classes.

I updated my code to use the newest hbase client jars provided by Google and the original problem was fixed.

I still get stuck with some ZK issue that I still did not figure out, but that's a conversation for a different question.

This one is answered!

EduBoom
  • 147
  • 1
  • 8
0

I have confronted the same error message:

Bigtable does not support managed connections.

However, according to my research, the root cause is that the class HTable can not be constructed explicitly. After changed the way to construct HTable by connection.getTable. The problem resolved.

Ram Ghadiyaram
  • 28,239
  • 13
  • 95
  • 121
shawn.wang
  • 81
  • 1
  • 5