0

I am trying to construct sql query inside databricks and this query needs to be done for every year and hence trying to add the year and table name as a variable inside the query as below.

I am trying to create 2 Temp tables for each year first as below

df_Conc_2019 = df_Conc.filter(col("Year") == 2019)
df_Conc_2019.createOrReplaceTempView('Conc_2019')

df_Conc_2020 = df_Conc.filter(col("Year") == 2020)
df_Conc_2020.createOrReplaceTempView('Conc_2020')

Then I am trying to join these tables (one by one or as per year with the main table) as below. But $Conc_Year and $Year inside my query don't work. I am trying to use for loop to achieve this as I have more than 2 years to do the analysis. Thanks.

Conc_Year = 'Conc_2019', 'Conc_2020'
Year = 2019, 2020 

query = "select A.invoice_date, A.Program_Year, $Conc_Year.BusinessSegment, $Conc_Year.Dealer_Prov, $Conc_Year.product_id... WHERE A.ID = $Conc_Year.ID AND A.Program_Year = $Year"

df_final = spark.sql(query)
Alex Ott
  • 80,552
  • 8
  • 87
  • 132
Lilly
  • 910
  • 17
  • 38

1 Answers1

1

Use f"{variable}" for format string in Python. For example:

for Year in [2019, 2020]:
    Conc_Year = f"Conc_{Year}"
    query = f"""
        select A.invoice_date,
               A.Program_Year,
               {Conc_Year}.BusinessSegment,
               {Conc_Year}.Dealer_Prov, 
               {Conc_Year}.product_id
        from A, {Conc_Year}
        WHERE A.ID = {Conc_Year}.ID AND A.Program_Year = {Year}
    """
mck
  • 40,932
  • 13
  • 35
  • 50