10

We are trying to use ThriftServer to query data from spark temp tables, in spark 2.0.0.

First, we have created sparkSession with enabled Hive Support. Currently, we start ThriftServer with sqlContext like this:

HiveThriftServer2.startWithContext(spark.sqlContext());

We have spark stream with registered temp table "spark_temp_table":

StreamingQuery streamingQuery = streamedData.writeStream()
                                             .format("memory")
                                             .queryName("spark_temp_table")
                                             .start();

With beeline we are able to see temp tables (running SHOW TABLES);

When we want to run second job (with second sparkSession) with this approach we have to start second ThriftServer with different port.

I have two questions here:

  1. Is there any way to have one ThriftServer on one port with access to all temp tables in a different sparkSessions?

  2. HiveThriftServer2.startWithContext(spark.sqlContext()); is annotated with @DeveloperApi. Is there any way to start thrift server with context not in the code programatically?
    I saw there is configuration --conf spark.sql.hive.thriftServer.singleSession=true passed to ThriftServer on startup (sbin/start-thriftserver.sh) but I don't understand how to define this for a job. I tried to set this configuration property in sparkSession builder , but beeline didn't display temp tables.

VladoDemcak
  • 4,893
  • 4
  • 35
  • 42
  • 2
    Before answering your question, I ll ask a question :) do you really need to start `ThriftServer` "programatically" ? – user1314742 Nov 16 '16 at 13:19
  • @user1314742 no we don't need (and don't want - try to avoid `HiveThriftServer2.startWithContext(spark.sqlContext());`). we actually tried to start `sbin/start-thriftserver.sh` with single session but with no luck. Basically what we need is to access the `temp` tables via spark JDBC server and querying `temp` tables (from different application with `JDBC` connection) – VladoDemcak Nov 16 '16 at 13:33
  • Is it possible to see temp tables when we are using `master local` at all? – VladoDemcak Nov 18 '16 at 13:46

1 Answers1

5

Is there any way to have one ThriftServer on one port with access to all temp tables in a different sparkSessions?

No. ThriftServer uses specific session and temporary tables can be accessed only within this session. This is why:

beeline didn't display temp tables.

when you start independent server with sbin/start-thriftserver.sh.

spark.sql.hive.thriftServer.singleSession doesn't mean you get a single session for multiple servers. It uses the same session for all connections to a single Thrift server. Possible use case:

  • you start thrift server.
  • client1 connects to this server and creates temp table foo.
  • client2 connects to this server and reads foo
  • Thank you for your answer it's much more clear now I really appreciate that! I have one more question here. Let's say i want to have `client1` as a *spark streaming job*. Is it possible to get `thriftserver session`, or how I can `"connect"` streaming job to running thrift server started with `sbin/start-thriftserver.sh`? – VladoDemcak Nov 20 '16 at 12:50
  • @VladoDemcak have you found any other way than running HiveThriftServer2 programmatically? I have a very similar use case and wonder maybe I'm just picking the wrong tool here. – Roman Sep 26 '17 at 05:10