1

I have looked all over for an answer to this and tried everything. Nothing seems to work. I'm trying to reference a variable assignment within a spark.sql query in python. Running python 3 and spark version 2.3.1.

bkt = 1

prime = spark.sql(s"SELECT ((year(fdr_date))*100)+month(fdr_date) as fdr_year, count(*) as counts\
                FROM pwrcrv_tmp\
                where EXTR_CURR_NUM_CYC_DLQ=$bkt\
                and EXTR_ACCOUNT_TYPE in('PS','PT','PD','PC','HV','PA')\
                group by ((year(fdr_date))*100)+month(fdr_date)\
                order by ((year(fdr_date))*100)+month(fdr_date)")

prime.show(50)

The error:

prime = spark.sql(s"SELECT ((year(fdr_date))*100)+month(fdr_date) as fdr_year, count(*) as counts                FROM pwrcrv_tmp         where EXTR_CURR_NUM_CYC_DLQ=$bkt                and EXTR_ACCOUNT_TYPE in('PS','PT','PD','PC','HV','PA')                group by ((year(fdr_date))*100)+month(fdr_date)                order by ((year(fdr_date))*100)+month(fdr_date)")
                                                                                                                                                                                                                                                                                                                                                                                         ^
SyntaxError: invalid syntax
Ronak Jain
  • 3,073
  • 1
  • 11
  • 17
email83
  • 21
  • 1
  • 5
  • bkt = 1 prime = spark.sql("SELECT ((year(fdr_date))*100)+month(fdr_date) as fdr_year, count(*) as counts\ FROM pwrcrv_tmp\ where EXTR_CURR_NUM_CYC_DLQ="%bkt%"\ and EXTR_ACCOUNT_TYPE in('PS','PT','PD','PC','HV','PA')\ group by ((year(fdr_date))*100)+month(fdr_date)\ order by ((year(fdr_date))*100)+month(fdr_date)") prime.show(50) – email83 Nov 01 '19 at 14:34
  • Is this a question? Not sure why you've posted more code in a comment, as well. Please read [How to Ask a Question](https://stackoverflow.com/help/how-to-ask) too. – David Buck Nov 01 '19 at 14:38
  • First of all `s"..."` is a syntax error - what is that supposed to mean? Secondly, trying to format a string with `$bkt` is not valid python syntax. Look up [String formatting in python](https://stackoverflow.com/questions/517355/string-formatting-in-python) – pault Nov 01 '19 at 14:44
  • The title of my post is my question. I got the s"..." from this answer which was marked correct on stackoverflow.https://stackoverflow.com/questions/37284216/spark-sql-passing-a-variable – email83 Nov 01 '19 at 14:57
  • @email83 I don't know what language that is, but the answer you're looking for is this one: https://stackoverflow.com/a/37284354/5858851 – pault Nov 01 '19 at 15:49
  • It's python. Yes, saw and tested that syntax as well. bkt = "1" prime = spark.sql("SELECT ((year(fdr_date))*100)+month(fdr_date) as fdr_year, count(*) as counts\ FROM pwrcrv_tmp\ where EXTR_CURR_NUM_CYC_DLQ='{}'\ and EXTR_ACCOUNT_TYPE in('PS','PT','PD','PC','HV','PA')\ group by ((year(fdr_date))*100)+month(fdr_date)\ order by ((year(fdr_date))*100)+month(fdr_date)").format(id) prime.show(50) AttributeError: 'DataFrame' object has no attribute 'format' – email83 Nov 01 '19 at 16:11

3 Answers3

1

I found the correct syntax buried in this databricks post.

https://forums.databricks.com/questions/115/how-do-i-pass-parameters-to-my-sql-statements.html

You add a lower case f in front of the query and wrap braces around the name of the variable in the query.

bkt = 1

prime = spark.sql(f"SELECT ((year(fdr_date))*100)+month(fdr_date) as fdr_year, count(*) as counts\
            FROM pwrcrv_tmp\
            where EXTR_CURR_NUM_CYC_DLQ={bkt}\
            and EXTR_ACCOUNT_TYPE in('PS','PT','PD','PC','HV','PA')\
            group by ((year(fdr_date))*100)+month(fdr_date)\
            order by ((year(fdr_date))*100)+month(fdr_date)")


prime.show(50)
David
  • 11,245
  • 3
  • 41
  • 46
email83
  • 21
  • 1
  • 5
  • Greetings from 2023! I just wanted to add that this same method also works with multiline query strings expressed with triple quotes like: `q = spark.sql(f"""SELECT * FROM {bkt}""")` – Swiffy Aug 28 '23 at 14:40
0

This should work

p_filename ='some value'
z='some value'

query = "INSERT into default.loginfordetails (filename,logdesc) values ('{}','{}') ".format(p_filename,z)
 
spark.sql(query)

n1tk
  • 2,406
  • 2
  • 21
  • 35
  • 1
    Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jul 11 '22 at 16:19
0

Since your query do use multi-line query is it advised as good coding style, to use """ """ to be easier and avoid slash all together as a good coding style"\" that does have a lot of issues or conversion to be parsed in python and pyspark in general or when using nbconvert for notebook to scrip and variables in parentheses {} inside of query or use .format(bkt) :

option{}:

bkt=1;
prime = spark.sql(f"""SELECT ((year(fdr_date))*100)+month(fdr_date) as fdr_year, count(*) as counts
                FROM pwrcrv_tmp
                where EXTR_CURR_NUM_CYC_DLQ={bkt}
                and EXTR_ACCOUNT_TYPE in('PS','PT','PD','PC','HV','PA')
                group by ((year(fdr_date))*100)+month(fdr_date)
                order by ((year(fdr_date))*100)+month(fdr_date)""")

prime.show(50);

option .format():

bkt=1;
prime = spark.sql(f"""SELECT ((year(fdr_date))*100)+month(fdr_date) as fdr_year, count(*) as counts
                FROM pwrcrv_tmp
                where EXTR_CURR_NUM_CYC_DLQ={}
                and EXTR_ACCOUNT_TYPE in('PS','PT','PD','PC','HV','PA')
                group by ((year(fdr_date))*100)+month(fdr_date)
                order by ((year(fdr_date))*100)+month(fdr_date)""".format(bkt)

prime.show(50);
n1tk
  • 2,406
  • 2
  • 21
  • 35