0

I am using PySpark SQL to create temporary views from dataframes and to make data processing with them. I created a python service where a user can hit some APIs where they can pass the dataframe and the SQL query to be applied to it to make the data processing. The problem comes when two users want to use the same name for a temporary view and I was wondering if there's a way to create a scoped temporary view on pyspark. I read that all temporary tables are session-scoped, but it appears to be shared between them.

I've thought of adding prefix the tables with the user name, but it would add extra complexity to the code.

  • Spark has "simple" and global temp views, as described in https://spark.apache.org/docs/3.2.1/sql-getting-started.html#global-temporary-view. . "Simple" temp views are indeed session-scoped. – mazaneicha May 17 '22 at 18:44
  • So here's what I tried: create a session s1 with name 'app1' create a session s2 with name 'app2' Create a temp view (non global) using sessions s1 query the view with session s1 -> returns data query the view with session s2 -> returns data – Samuele Ceroni May 18 '22 at 09:48
  • `s1 = SparkSession.builder.appName('app1').master(loacal[4]).getOrCreate() s2 = SparkSession.builder.appName('app2').master(loacal[4]).getOrCreate() data_json = [{'name': 'myName', 'age': 18}, {'name': 'myName2', 'age': 19}] df = s1.read.json(s1.sparkContext.parallelize(data_json)) df.createOrReplaceView('myView') # non global sqlDF1 = s1.sql('select * from myView') sqlDF2 = s2.sql('select * from myView')` both return the data in data_json – Samuele Ceroni May 18 '22 at 09:56
  • SparkSession's static method `getOrCreate` called that for a reason. https://stackoverflow.com/questions/40153728/multiple-sparksessions-in-single-jvm – mazaneicha May 18 '22 at 10:54

0 Answers0