1

I have a PySpark dataframe. It has an integer column "index" and a double column "feature". I also have a python list parameterwith length the same as number of unique elements in "index" column.

I would like to generate a new column in the following way in PySpark: for each row, if the value of "index" is i, then I would like to multiply "feature" by parameter[i] for this new column.

For small number of elements in parameter, I can use when().when().otherwise to generate the output. How should I do it when the number of elements in parameter is large?

DiveIntoML
  • 2,347
  • 2
  • 20
  • 36

0 Answers0