0

I need to build the final dataframe name dynamically based on the config (join final_df and suffix).When I run the code mentioned at the end, I get the error - "SyntaxError: can't assign to operator". However If I replace each["final_df"]+'_'+ each["suffix"] with any other name, it works.

Data :

df_source_1 = spark.createDataFrame(
        [
          (123,10),
          (123,15),
          (123,20)
        ],
        ("cust_id", "value")
    )

Config:

config = """
                [ 
                  {
                      "source_df":"df_source_1",
                      "suffix": "new", 
                      "group":["cust_id"],
                      "final_df": "df_taregt_1"
                  }
                ]
                """   

Code:

import json   
for each in json.loads(config):
    print("Before=",each['final_df'] ) # str object
    print(each["final_df"]+'_'+ each["suffix"]) # df_taregt_1_new , print statement works
    each["final_df"]+'_'+ each["suffix"] = eval(each["source_df"]).groupBy(each["group"]).agg(sum("value")) # Errors out. Here I need to assign the dataframe to df_taregt_1_new

Could any one help.

Matthew
  • 55
  • 7
  • That's a terrible implementation you're trying to do. You should probably explain why you want to do that ... using a dict with key as the old or new name would be way better. [Why is using 'eval' a bad practice?](https://stackoverflow.com/questions/1832940/why-is-using-eval-a-bad-practice) – Steven Aug 31 '21 at 12:21
  • 1
    FYI, `each["final_df"]+'_'+ each["suffix"]` is a string, it cannot be assigned. That's why you got the error. – Steven Aug 31 '21 at 12:24
  • posted oversimplified usecase. In real case, for the same data source/group, I have different operation. ie min and max. So I wanted to create two dataframes, with names created dynamically based on operation, so that at the end I can combine both the dataframes into one ie one mentioned in "final_df". Config would look like : { "source_df":"df_source_1", "operation": { "min": {}, "max" : {}}, "group":["cust_id"], "final_df": "df_taregt_1" } – Matthew Aug 31 '21 at 12:31
  • @Steven , also in the cases where we would need to source the dataframe names from config, what alternatives do we have other than eval? – Matthew Aug 31 '21 at 12:39
  • Use a dict ... That's so simpler. No need to create dynamic variables, no need to use eval. – Steven Aug 31 '21 at 12:46

1 Answers1

1

You code with a dict :

df_dict = {}
df_dict["df_source_1"] = spark.createDataFrame(
    [(123, 10), (123, 15), (123, 20)], ("cust_id", "value")
)

for each in json.loads(config):
    df_dict[each["final_df"] + "_" + each["suffix"]] = (
        df_dict[each["source_df"]].groupBy(each["group"]).agg(sum("value"))
    )

Instead of working with object that are supposedly created dynamically, you have a dict that stores all these objects with their dynamic names. You can even test your dict to know if an object exists or not.

Steven
  • 14,048
  • 6
  • 38
  • 73