0

How do I append to a list when using foreach on a dataframe? For my case, I would like to collect values from each row using a self defined function and append them into a list. The function would return a list of values.

Here is the pseudocode below. I would like to try something like this:

listOfDfs = []
df.foreach(lambda row: listOfDfs.extend(getRowInfo(row)))
listOfDfs

But listOfDfs remain empty. I am guessing this is because foreach returns empty. I've tried using map and also doesn't seem to append listOfDfs. Can anyone help me with this?

Peter Trcka
  • 1,279
  • 1
  • 16
  • 21
  • I don't think this type of a "mutate in place " implementation is encouraged with Spark. I believe what you can do instead is use `.collect()` on the column you want to collect: https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.DataFrame.collect.html For example:`df.select("myCol").collect()`, this will give you a list of `Row` elements. You can access the row values by its attribute: `[row.myCol for row in df.collect()]` – k88 Oct 13 '21 at 06:21
  • will this help? https://stackoverflow.com/questions/42211594/lambda-to-assign-a-value-to-global-variable – Peter Trcka Oct 14 '21 at 16:02

0 Answers0