Spark SQL pass variable to query

Question

I have looked all over for an answer to this and tried everything. Nothing seems to work. I'm trying to reference a variable assignment within a spark.sql query in python. Running python 3 and spark version 2.3.1.

bkt = 1

prime = spark.sql(s"SELECT ((year(fdr_date))*100)+month(fdr_date) as fdr_year, count(*) as counts\
                FROM pwrcrv_tmp\
                where EXTR_CURR_NUM_CYC_DLQ=$bkt\
                and EXTR_ACCOUNT_TYPE in('PS','PT','PD','PC','HV','PA')\
                group by ((year(fdr_date))*100)+month(fdr_date)\
                order by ((year(fdr_date))*100)+month(fdr_date)")

prime.show(50)

The error:

prime = spark.sql(s"SELECT ((year(fdr_date))*100)+month(fdr_date) as fdr_year, count(*) as counts                FROM pwrcrv_tmp         where EXTR_CURR_NUM_CYC_DLQ=$bkt                and EXTR_ACCOUNT_TYPE in('PS','PT','PD','PC','HV','PA')                group by ((year(fdr_date))*100)+month(fdr_date)                order by ((year(fdr_date))*100)+month(fdr_date)")
                                                                                                                                                                                                                                                                                                                                                                                         ^
SyntaxError: invalid syntax

bkt = 1 prime = spark.sql("SELECT ((year(fdr_date))*100)+month(fdr_date) as fdr_year, count(*) as counts\ FROM pwrcrv_tmp\ where EXTR_CURR_NUM_CYC_DLQ="%bkt%"\ and EXTR_ACCOUNT_TYPE in('PS','PT','PD','PC','HV','PA')\ group by ((year(fdr_date))*100)+month(fdr_date)\ order by ((year(fdr_date))*100)+month(fdr_date)") prime.show(50) — email83, Nov 01 '19 at 14:34
Is this a question? Not sure why you've posted more code in a comment, as well. Please read [How to Ask a Question](https://stackoverflow.com/help/how-to-ask) too. — David Buck, Nov 01 '19 at 14:38
First of all `s"..."` is a syntax error - what is that supposed to mean? Secondly, trying to format a string with `$bkt` is not valid python syntax. Look up [String formatting in python](https://stackoverflow.com/questions/517355/string-formatting-in-python) — pault, Nov 01 '19 at 14:44
The title of my post is my question. I got the s"..." from this answer which was marked correct on stackoverflow.https://stackoverflow.com/questions/37284216/spark-sql-passing-a-variable — email83, Nov 01 '19 at 14:57
@email83 I don't know what language that is, but the answer you're looking for is this one: https://stackoverflow.com/a/37284354/5858851 — pault, Nov 01 '19 at 15:49
It's python. Yes, saw and tested that syntax as well. bkt = "1" prime = spark.sql("SELECT ((year(fdr_date))*100)+month(fdr_date) as fdr_year, count(*) as counts\ FROM pwrcrv_tmp\ where EXTR_CURR_NUM_CYC_DLQ='{}'\ and EXTR_ACCOUNT_TYPE in('PS','PT','PD','PC','HV','PA')\ group by ((year(fdr_date))*100)+month(fdr_date)\ order by ((year(fdr_date))*100)+month(fdr_date)").format(id) prime.show(50) AttributeError: 'DataFrame' object has no attribute 'format' — email83, Nov 01 '19 at 16:11

score 1 · Accepted Answer · edited Nov 01 '19 at 17:17

I found the correct syntax buried in this databricks post.

https://forums.databricks.com/questions/115/how-do-i-pass-parameters-to-my-sql-statements.html

You add a lower case f in front of the query and wrap braces around the name of the variable in the query.

bkt = 1

prime = spark.sql(f"SELECT ((year(fdr_date))*100)+month(fdr_date) as fdr_year, count(*) as counts\
            FROM pwrcrv_tmp\
            where EXTR_CURR_NUM_CYC_DLQ={bkt}\
            and EXTR_ACCOUNT_TYPE in('PS','PT','PD','PC','HV','PA')\
            group by ((year(fdr_date))*100)+month(fdr_date)\
            order by ((year(fdr_date))*100)+month(fdr_date)")


prime.show(50)

Greetings from 2023! I just wanted to add that this same method also works with multiline query strings expressed with triple quotes like: `q = spark.sql(f"""SELECT * FROM {bkt}""")` — Swiffy, Aug 28 '23 at 14:40

score 0 · Answer 2 · edited Jan 20 '23 at 23:35

0

This should work

p_filename ='some value'
z='some value'

query = "INSERT into default.loginfordetails (filename,logdesc) values ('{}','{}') ".format(p_filename,z)
 
spark.sql(query)

edited Jan 20 '23 at 23:35

n1tk

2,406
2
21
35

answered Jul 11 '22 at 11:42

Nihar Handoo

1

1

Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jul 11 '22 at 16:19

score 0 · Answer 3 · answered Jan 22 '23 at 11:33

Since your query do use multi-line query is it advised as good coding style, to use """ """ to be easier and avoid slash all together as a good coding style"\" that does have a lot of issues or conversion to be parsed in python and pyspark in general or when using nbconvert for notebook to scrip and variables in parentheses {} inside of query or use .format(bkt) :

option{}:

bkt=1;
prime = spark.sql(f"""SELECT ((year(fdr_date))*100)+month(fdr_date) as fdr_year, count(*) as counts
                FROM pwrcrv_tmp
                where EXTR_CURR_NUM_CYC_DLQ={bkt}
                and EXTR_ACCOUNT_TYPE in('PS','PT','PD','PC','HV','PA')
                group by ((year(fdr_date))*100)+month(fdr_date)
                order by ((year(fdr_date))*100)+month(fdr_date)""")

prime.show(50);

option .format():

bkt=1;
prime = spark.sql(f"""SELECT ((year(fdr_date))*100)+month(fdr_date) as fdr_year, count(*) as counts
                FROM pwrcrv_tmp
                where EXTR_CURR_NUM_CYC_DLQ={}
                and EXTR_ACCOUNT_TYPE in('PS','PT','PD','PC','HV','PA')
                group by ((year(fdr_date))*100)+month(fdr_date)
                order by ((year(fdr_date))*100)+month(fdr_date)""".format(bkt)

prime.show(50);

Spark SQL pass variable to query

3 Answers3

This should work