Ok, I've been dealing with this problem for a couple of days and it's driving me nuts. I need to use the Hive database with transactions to perform 'update' and 'delete' operations.
I have installed Hadoop and Hive on my machine in pseudo-distributed mode. I have followed this tutorial for the installation. I'm using Java 1.8.0_31, Hadoop 2.6.0, Hive 1.0.0 and there were also a couple of details I changed, but these shouldn't be relevant.
Now, to start my environment (after a reboot, for example), i run the following:
start-dfs.sh
start-yarn.sh
java -jar /usr/local/derby/lib/derbyrun.jar server start &
hive
And everything seems to work fine. Although the tutorial doesn't mention starting derby, if i don't start it, the metastore isn't available (which seems logical) and hive doesn't start.
From here, i can create tables, show tables, connect with my JDBC client, etc etc, everything works great. Now, i need to enable transactions. Following this link and this link i get to the following command:
hive --hiveconf hive.root.logger=info,console
--hiveconf hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
--hiveconf hive.compactor.initiator.on=true
--hiveconf hive.compactor.worker.threads=1
--hiveconf hive.txn.driver=jdbc:derby://localhost:1527/metastore_db;create=true
Sidenote: I'm changing the command and not hive-site.xml just because it's easier to change between commands when trying what works and what doesn't work instead of repeatedly changing the XML file.
I have also tried changing the driver url to jdbc:derby://localhost:1527/metastore_db;create=true;user=APP;password=mine
just in case it was needed, but there's no change. When i issue a command (like show tables
), i get an error:
15/03/04 23:26:17 [main]: ERROR metastore.RetryingHMSHandler:
MetaException(message:Unable to select from transaction database,
java.sql.SQLSyntaxErrorException: Table/View 'TXNS' does not exist.
According to this and one of the previous links, it seems like the hive.in.test
property must be set to true
. So, my launch command becomes:
hive --hiveconf hive.root.logger=info,console
--hiveconf hive.in.test=true
--hiveconf hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
--hiveconf hive.compactor.initiator.on=true
--hiveconf hive.compactor.worker.threads=1
--hiveconf hive.txn.driver=jdbc:derby://localhost:1527/metastore_db;create=true;
With this command, I get a new error:
ERROR metastore.RetryingHMSHandler: java.lang.NullPointerException
at org.apache.hadoop.hive.metastore.txn.TxnHandler.checkQFileTestHack(TxnHandler.java:1146)
And this error doesn't exist anywhere, i feel like i'm the only person on the internet with it. Anyway, because i couldn't find any solution, I dug into the source code:
private void checkQFileTestHack() {
boolean hackOn = HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_IN_TEST) ||
HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_IN_TEZ_TEST);
LOG.info("Before if");
if (hackOn) {
LOG.info("Hacking in canned values for transaction manager");
// Set up the transaction/locking db in the derby metastore
TxnDbUtil.setConfValues(conf);
try {
TxnDbUtil.prepDb();
} catch (Exception e) {
// We may have already created the tables and thus don't need to redo it.
if (!e.getMessage().contains("already exists")) {
throw new RuntimeException("Unable to set up transaction database for" +
" testing: " + e.getMessage());
}
}
}
}
Line 1146 is the if (!e.getMessage().contains("already exists"))
line, which doesn't seem to make much sense, unless "e" is a null, which is strange. Anyway, I thought i could debug this further by adding a few more logging messages, building the project and replacing the original metastore jar (which is where this TxnHandler class is) which my modified one. For that, i downloaded the source code and followed this to build it. I tried maven2 and it didn't work, because some plug-in only worked with maven3, so I got maven3 from here and built the project.
If i build it with the mvn clean install -Phadoop-2,dist
command, not only does it take forever, but it fails during the test phase. Because it doesn't fail on the metastore (on the metastore, it skips 1 test, i'm not sure that's supposed to happen), i thought i could just build it without testing. So, we get to this:
mvn clean install -DskipTests -Phadoop-2,dist
rm /usr/local/hive/lib/hive-metastore-1.0.0.jar
cp packaging/target/apache-hive-1.0.0-bin/apache-hive-1.0.0-bin/lib/hive-metastore-1.0.0.jar /usr/local/hive/lib/
Sidenote: in the interest of time, i also tried the -pl metastore -am
arguments, but while maven says that metastore has been built, the jar in the lib folder does not change, so I'm guessing I'm doing something wrong.
Anyway, this should build my modified jar, replace the one in hive and, when i start hive again, it should load mine. However, even after i change the code, the error still shows the same, my new logging info isn't registered, even the error line remains the same. It's like i changed nothing in my new jar.
Its strange, i know maven is compiling my code because it recognizes compile errors and i can see on the jar properties that it's a new file, so why don't the rest of my changes show up? Hive recognizes when I delete the original jar, but when I replace it with my modified version, its like I changed nothing.
Anyway, as you can see, i've had many troubles and i've tried to fix most of them. But now im stuck in this one, without being able to use a damn "delete" command because i cant enable transactions. Can anyone point me in the right direction? Tyvm!
... and sorry for the long post.