0

I need help with this simple piece of code (pyspark):

def ann(table):
    table=table.withColumn('stand', lit('29Jan2020'))
for table in [akt_test, b60_test, db71_test, pek6_test, db00f_test, d23b_test, bw0110_test]:
    ann(table)

So I only try to add the column "stand" to all (already existing) dataframes from the list. Unfortunately the column is not added. Strangely, if i add a command "print(table.columns)" at the end of the function "ann", I see the new column there. But not in the actual data frames.

If i simply take one data frame and write

 akt_test=akt_test.withColumn('stand', lit('29Jan2020'))

everythings works fine. But not in a loop. I don't understand why and how I can heal it. Thanks in advance for Your ideas.

mck
  • 40,932
  • 13
  • 35
  • 50
Logic_Problem_42
  • 229
  • 2
  • 11

1 Answers1

1

The function has to return the modified dataframe. In your function, only a copy of the table variable was modified, not the original dataframe variable.

Also, you need to apply the function to the list elements (e.g. using a list comprehension as below). If you put it in a for loop, the list elements won't be modified in place. Only the copies of the list elements will be modified, which are then discarded in the next iteration of the loop. See this question for example.

So, to make your code work, you can do this:

def ann(table):
    return table.withColumn('stand', lit('29Jan2020'))

df_list = [akt_test, b60_test, db71_test, pek6_test, db00f_test, d23b_test, bw0110_test]

df_list2 = [ann(df) for df in df_list]

If you want to keep the values of the original variables, you can do

akt_test, b60_test, db71_test, pek6_test, db00f_test, d23b_test, bw0110_test = [ann(df) for df in df_list]
mck
  • 40,932
  • 13
  • 35
  • 50
  • Thanks a lot! But now the new dataframes are in the second list. So if I want that the changed dataframe "akt_test" still has the name "akt_test", I have to write "akt_test=df_list2[0]" and so on. It doesn't seem like an ideal solution for me. Can I do it somehow more efficiently? – Logic_Problem_42 Jan 29 '21 at 16:48
  • @Logic_Problem_42 see the last code snippet in the edited answer. – mck Jan 29 '21 at 16:51