3

have you ever used Saiku to make data analysis on BigData Platform (Hadoop)? My recent work need to integrate some legacy BI tools with Hadoop to support common OLAP queries on HDFS/HBase.

I found a solution implemented with Phoenix & Hbase here, which bridges saiku and Hbase with SQL Dialect in Phoenix and it worked. However, this method can only handle data within HBase through HBase-API. It cannot boost any Map-Reduce style job when building the data cube. I prefer some more BigData compatible alternatives, like through Apache Hive.

Saiku is based on Mondrian. My version of Saiku use Mondrian-4.0.0.0-SNAPSHOT.jar, which I found can already work well with Hive. And I found that there are many Hive-0.13 jars within Saiku's lib directory. So I thought a simple config of hive2 datasource can work. I started an hiveserver2 in the namenode of my HDFS cluster and add following datasource into saiku.

Name: hive2
Connection Type: Mondrian
URL: jdbc:hive2://localhost:10000/default
Schema: /datasources/movie.xml
Jdbc Driver: org.apache.hive.jdbc.HiveDriver
Username: ubuntu
Password: XXXX

The saiku indeed successfully connected to the hiveserver2 but failed to load the datasource. I found following error in the saiku log:

name:hive2
driver:mondrian.olap4j.MondrianOlap4jDriver
url:jdbc:mondrian:Jdbc=jdbc:hive2://localhost:10000/default;Catalog=mondrian:///datasources/movie.xml;JdbcDrivers=org.apache.hive.jdbc.HiveDriver
12:41:48,110 WARN  [RolapSchema] Model is in legacy format
12:41:50,464 ERROR [SecurityAwareConnectionManager] Error connecting: hive2
mondrian.olap.MondrianException: Mondrian Error:Internal error: while quoting identifier
    at mondrian.resource.MondrianResource$_Def0.ex(MondrianResource.java:992)
    at mondrian.olap.Util.newInternal(Util.java:2543)
    at mondrian.spi.impl.JdbcDialectImpl.deduceIdentifierQuoteString(JdbcDialectImpl.java:245)
    at mondrian.spi.impl.JdbcDialectImpl.<init>(JdbcDialectImpl.java:146)
    at mondrian.spi.DialectManager$DialectManagerImpl$1.createDialect(DialectManager.java:210)
...
Caused by: java.sql.SQLException: Method not supported
    at org.apache.hive.jdbc.HiveDatabaseMetaData.getIdentifierQuoteString(HiveDatabaseMetaData.java:342)
    at org.apache.commons.dbcp.DelegatingDatabaseMetaData.getIdentifierQuoteString(DelegatingDatabaseMetaData.java:306)
    at mondrian.spi.impl.JdbcDialectImpl.deduceIdentifierQuoteString(JdbcDialectImpl.java:238)
    ... 99 more

I looked into the hive 0.13 source. I found the getIdentifierQuoteString isn't implemented yet and simply throw an exception.

   public String getIdentifierQuoteString() throws SQLException {                                                                                                             
     throw new SQLException("Method not supported");
   }

Till now I'm puzzled. Is it practical to use the saiku with a hive? It has Hive 0.13 jars in its lib dir but cannot load a simple hive datasource? Should I simply modify the source of hive. I found in the newly released Hive 1.0. This function is implemented by simple return an empty string.

Does anyone has good idea? Thanks!

He Bai
  • 305
  • 4
  • 12
  • I bypass some un-implemented function in Hive-JDBC-0.13.1, but still cannot handle Hive table. I'll keep looking into this question – He Bai Mar 06 '15 at 15:11
  • 1
    >"Map-Reduce style job when building the data cube" - why would you need mapreduce job if you can run much faster SELECT in Phoenix/Hbase on the same data sitting in hadoop? have you seen performance comparison: http://phoenix.apache.org/performance.html ? – alex May 01 '15 at 18:23
  • @alex The major reason is that I cannot call for all the data must be hosted within HBase. Most data is uploaded into a HDFS dir or Hive. If using the phoenix/HBase solution, I have to repeatedly process ETL to HDFS. – He Bai May 14 '15 at 05:24

0 Answers0