Connect spark with BI tools like power bi and tableau

Question

I need to connect spark with powerbi. I don't know the required drivers for the same. And also i am running spark in local mode without installing apache hive. So I don't have hive-site.xml file for configuring thrift server. After starting thrift server I started $SPARK_HOME\bin\beeline.cmd and connected thrift server with command !connect jdbc:hive2://localhost:10000 and using userid as Administrator(same as my local machine) and blank password and the output was:

beeline> !connect jdbc:hive2://localhost:10000
Connecting to jdbc:hive2://localhost:10000
Enter username for jdbc:hive2://localhost:10000: Administrator
Enter password for jdbc:hive2://localhost:10000:
log4j:WARN No appenders could be found for logger (org.apache.hive.jdbc.Utils).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Connected to: Spark SQL (version 2.0.1)
Driver: Hive JDBC (version 1.2.1.spark2)
Transaction isolation: TRANSACTION_REPEATABLE_READ

It seems that the connection is made but when querying about databases with command: show databases;, it is showing error (in beeline):

Error: org.apache.thrift.transport.TTransportException: java.net.SocketException: Software caused connection abort: socket write error (state=08S01,code=0)` and error(in thrift server cmd):`Exception in thread "HiveServer2-Handler-Pool: Thread-XXX"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "HiveServer2-Handler-Pool: Thread-XXX"

I don't understand this error. Please help me on this, and also I want to connect it with powerbi desktop installed on local machine. Can someone provide some links to read from for making the connection?

score 0 · Answer 1 · edited May 23 '17 at 10:33

0

@Birla, It looks like a TCP error as mentioned in the question asked here.

It is not recommended to use Thrift in a local machine as Thrift server needs pretty good processing with a dedicated Metastore servers to handle authentication and parallelism.

Recommended : Install Horton Works/Cloudera ready to work VM and then access these from power BI.

edited May 23 '17 at 10:33

Community

1
1

answered Nov 23 '16 at 11:13

JustCoder

327
1
4
13

Thank you for your reply. I am just testing the connections now. Once its finished, i will work on cluster mode with dedicated metastore servers. I am not able to identify the exact remedy for error. Can you be some more specific with the solution. – Bhanuday Birla Nov 23 '16 at 13:32
@JustCoder...Also i have confusion about whether to use hive-site.xml or not. And if i don't use it then what would be my credentials and what will be my spark-warehouse directory which will be accessed in my BI tool? – Bhanuday Birla Nov 24 '16 at 10:47
By default spark uses this folder /usr/hive/warehouse/ for all the hive related queries and /usr/hive/warehouse/records for storing the files which will be later queried. It is recommended to setup hive-ste.xml to the local storage as file system and single process mode for execution. – JustCoder Nov 24 '16 at 11:56
@JustCoder. Power BI is using `http://example.com:10000/cliservice` as an example for the server. 10000 is the default Hive thrift server port, Can it be used for http? – Tom May 24 '18 at 06:53

Connect spark with BI tools like power bi and tableau

1 Answers1