I am new to Scala and planning to convert PySpark code into Scala. In Pyspark, we can use same variable for multiple transformations. Below is the example:
final_df=final_df.withColumn('xyz',final_df["items_summaries_marketplaceId"])
#drop unwanted_columns
unwanted_columns = [x for x in final_df.columns if 'xyz' in x or 'zxy' in x ]
final_df=final_df.drop(*unwanted_columns)
final_df is used for both transformations. I have converted this code to scala. As per my R&D, I will have to declare new variable after every transformation. Below is the code:
val final_df=df.withColumn("xyz",df("items_summaries_marketplaceId"))
val drop_language_cols=final_df.drop(final_df.columns.filter(_.contains("xyz")): _*)
val drop__cols=drop_language_cols.drop(drop_language_cols.columns.filter(_.contains("zxt")): _*)
Do we have to declare a new variable after every transformation? Any help will be highly appreciated.