3

Trying to understand how to use the Spark Global Temporary Views.

In one spark-shell session I've created a view

spark = SparkSession.builder.appName('spark_sql').getOrCreate()

df = (
spark.read.option("header", "true")
    .option("delimiter", ",")
    .option("inferSchema", "true")
    .csv("/user/root/data/cars.csv"))

df.createGlobalTempView("my_cars")

# works without any problem
spark.sql("SELECT * FROM global_temp.my_cars").show()

And on another I tried to access it, without success (table or view not found).

 #second Spark Shell 
 spark = SparkSession.builder.appName('spark_sql').getOrCreate()
 spark.sql("SELECT * FROM global_temp.my_cars").show()

That's the error I receive :

 pyspark.sql.utils.AnalysisException: u"Table or view not found: `global_temp`.`my_cars`; line 1 pos 14;\n'Project [*]\n+- 'UnresolvedRelation `global_temp`.`my_cars`\n"

I've read that each spark-shell has its own context, and that's why one spark-shell cannot see the other. So I don't understand, what's the usage of the GTV, where will it be useful ?

Thanks

Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420
Dan
  • 393
  • 2
  • 4
  • 12
  • Can you please share code how you try to access view? – addmeaning Mar 05 '18 at 10:39
  • 1
    Hi, I added the code to my question – Dan Mar 05 '18 at 10:43
  • It is probably unrelated but may skip `SparkSession` initialization, since it is already initialized when you initialize `spark-shell`. Since your code looks reasonable, can you also include error message you got? – addmeaning Mar 05 '18 at 12:45
  • I added the error message. Are you able to execute same code on your machine (with different table off course)? – Dan Mar 05 '18 at 13:35
  • Can you connect to Hive? If yes please check if the table my_cars exists there. – abiratsis Mar 05 '18 at 21:19
  • Hi, just tried it. I don't see the table in Hive – Dan Mar 06 '18 at 07:17
  • The global_temp views are scoped to spark sessions that are concurrent with the spark session that created the view so won't be visible in Hive. When the creating session ends, the view is gone. – Davos Apr 17 '18 at 06:06

2 Answers2

7

in the spark documentation you can see:

If you want to have a temporary view that is shared among all sessions and keep alive until the Spark application terminates, you can create a global temporary view.

The global table remains accessible as long as the application is alive. Opening a new shell and giving it the same application will just create a new application.

you can try and test it within the same shell:

spark.newSession.sql("SELECT * FROM global_temp.my_cars").show()

please see my answer on a similar question for a more detailed example as well as a short definition of a Spark Application and Spark Session

Avi Chalbani
  • 842
  • 7
  • 11
  • Thanks Avi, I guess my question is how do you actually implement this kind of long live application, where you interactively create and close sessions. – Dan Mar 16 '18 at 05:33
  • it depends what you want to do. in a batch job you can open multiple sessions to do different stuff. The batch job terminates when your are done. The Streaming Context on the hand remains open and will be executed periodically. – Avi Chalbani Mar 18 '18 at 11:15
  • @Dan If you are interactively creating and closing sessions then it's not a long running application. global_temp views are intended for sharing data between users, I wouldn't use them as a shared memory space or IPC for applications. If you have separate processes that need to share data, persist it to a permanent table or a file. If it is intended to be a long running data processing pipeline, use spark-submit command line to send a jar or python script to the cluster with command line parameters, or an appropriate API for your spark distribution.which will run in a single spark session. – Davos Apr 17 '18 at 06:54
1

Temporary views in Spark SQL are session-scoped and will disappear if the session that creates it terminates. If you want to have a temporary view that is shared among all sessions and keep alive until the Spark application terminates, you can create a global temporary view. Global temporary view is tied to a system preserved database global_temp, and we must use the qualified name to refer it,

df.createGlobalTempView("people")

Viraj Wadate
  • 5,447
  • 1
  • 31
  • 29