29

I am writing spark code in python. How do I pass a variable in a spark.sql query?

    q25 = 500
    Q1 = spark.sql("SELECT col1 from table where col2>500 limit $q25 , 1")

Currently the above code does not work? How do we pass variables?

I have also tried,

    Q1 = spark.sql("SELECT col1 from table where col2>500 limit q25='{}' , 1".format(q25))
Viv
  • 1,474
  • 5
  • 28
  • 47
  • have you tried ? Q1 = spark.sql("SELECT col1 from table where col2>500 limit q25='%s' , 1"%(q25) – Daniyal Syed Jun 16 '17 at 06:47
  • 1
    Possible duplicate of [Does Python has a similar variable interpolation like "string #{var}" in Ruby?](https://stackoverflow.com/questions/11788472/does-python-has-a-similar-variable-interpolation-like-string-var-in-ruby) – OneCricketeer Jun 16 '17 at 12:33
  • It's just a string... You're question isn't really about Spark – OneCricketeer Jun 16 '17 at 12:34

5 Answers5

22

You need to remove single quote and q25 in string formatting like this:

Q1 = spark.sql("SELECT col1 from table where col2>500 limit {}, 1".format(q25))

Update:

Based on your new queries:

spark.sql("SELECT col1 from table where col2>500 order by col1 desc limit {}, 1".format(q25))

Note that the SparkSQL does not support OFFSET, so the query cannot work.

If you need add multiple variables you can try this way:

q25 = 500
var2 = 50
Q1 = spark.sql("SELECT col1 from table where col2>{0} limit {1}".format(var2,q25))
Tiny.D
  • 6,466
  • 2
  • 15
  • 20
  • This is still giving me mismatched input exception : spark.sql(SELECT col1 from table where col2>500 order by col1 desc limit {}, 1".format(q25)) – Viv Jun 16 '17 at 07:31
  • mismatched input for ',' – Viv Jun 16 '17 at 07:41
  • before SELECT, you need double quotes. – Tiny.D Jun 16 '17 at 07:43
  • I have used " Looks llike the query does not work with ,1 i.e OFFSET is not supported in spark sql. any work arounds? – Viv Jun 16 '17 at 07:46
  • @Viv yes, you are right, SparkSQL does not support OFFSET. you can refer to this answer https://stackoverflow.com/questions/42560815/in-pyspark-hivecontext-what-is-the-equivalent-of-sql-offset – Tiny.D Jun 16 '17 at 08:08
  • Now how to add the second {} if there are 2 variables i want to use? – Viv Jun 16 '17 at 10:28
9

Using f-Strings approach (PySpark):

table = 'my_schema.my_table'

df = spark.sql(f'select * from {table}')
Luiz Viola
  • 2,143
  • 1
  • 11
  • 30
5

Another option if you're doing this sort of thing often or want to make your code easier to re-use is to use a map of configuration variables and the format option:

configs = {"q25":10,
           "TABLE_NAME":"my_table",
           "SCHEMA":"my_schema"}
Q1 = spark.sql("""SELECT col1 from {SCHEMA}.{TABLE_NAME} 
                  where col2>500 
                  limit {q25}
               """.format(**configs))
David Maddox
  • 1,884
  • 3
  • 21
  • 32
  • this looks great. however, it doesn't work as it says name: 'configs' is not defined. What am I missing? @David Maddox – Yags Jun 30 '20 at 16:44
  • found the mistake , if you're using jupyter notebook you have to write the dictionary in the same cell as well – Yags Jun 30 '20 at 16:55
2

A really easy solution is to store the query as a string (using the usual python formatting), and then pass it to the spark.sql() function:

q25 = 500

query = "SELECT col1 from table where col2>500 limit {}".format(q25)

Q1 = spark.sql(query)
user6386471
  • 1,203
  • 1
  • 8
  • 17
1

All you need to do is add s (String interpolator) to the string. This allows the usage of variable directly into the string.

val q25 = 10
Q1 = spark.sql(s"SELECT col1 from table where col2>500 limit $q25)