0

I am upgrading my server to spark 2.3.0 and job-server 0.8.1-SNAPSHOT from spark 2.1.1 and job-server 0.8.0 (which were working fine). I am using the JobSqlDao with MySql and am using the SessionContextFactory to create a sqlContext. In local.conf, I have:

 sql-context {
      context-factory = spark.jobserver.context.SessionContextFactory
 }

Everything started up fine, and everything looks good at ports 8090 and 4040, but when I tried to issue my first request to job-server, I got

ERROR XBM0H: Directory /metastore_db cannot be created

After searching the web, this looks like something to do with Hive. I don't use hive in my code, but it seems that something about the new SessionContextFactory seems to require it. So I added a hive-site.xml file in spark/conf with the following contents:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:derby:;databaseName=/home/mineset/metastore_db;create=true</value>
  <description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>org.apache.derby.jdbc.EmbeddedDriver</value>
</property>
</configuration>

but now I get this error:

Caused by: java.net.ConnectException: Call From ra.esi-internal.esi-group.com/192.168.xxx.xx to 192.168.xxx.xx:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)

I also tried having the hive-site.xml configured to use mysql (acutally mariadb) but then I got an error like this:

ENGINE=INNODB : Specified key was too long; max key length is 767 bytes com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Specified key was too long; max key length is 767 bytes A web search indicates that this might be fixed by updrading to 5.7 (I am currently using 5.5 and really don't want to upgrade).

Is there a context factory that I can use to get a sqlContext that does not require hive? This SessionContextFactory is new in 0.8.1, and seems to be the source of my problems. There seem to be many ways to configure hive (with derby, mysql, embeded, client, etc). I don't think it will matter in my case how it is configured, since I don't use it - I just want to find a simple way that does not give an error. Any suggestions?

user1933178
  • 340
  • 3
  • 12
  • 1
    SparkSQL needs a "metastore" stub, with or without Hive. If no proper Hive Metastore service is provided, then it automatically starts an "embedded" Metastore backed by a Derby DB instance. By default, that Derby instance creates its log file and data directory in the **current working directory**. Looks like your CWD is `/` which is quite dangerous, generally speaking... – Samson Scharfrichter Apr 30 '18 at 21:10
  • Port 8020 is used by HDFS NameNode, that error has nothing to do with Hive. And the log fragment displayed contains no useful information because `sun.reflect.*` stuff refers to generic JVM "reflection" that could apply to any class. You should skip these and display the actual class being instanciated. – Samson Scharfrichter Apr 30 '18 at 21:17
  • For the "embedded metastore" Derby files location issue, cf. https://issues.apache.org/jira/browse/SPARK-4758 and https://stackoverflow.com/questions/38377188/how-to-get-rid-of-derby-log-metastore-db-from-spark-shell >> for the HDFS connection issue, do some debugging. – Samson Scharfrichter Apr 30 '18 at 21:30

0 Answers0