PySpark: Use dataframe column as index for python list

Asked Apr 09 '20 at 03:29

Active Apr 09 '20 at 04:05

Viewed 83 times

I have a PySpark dataframe. It has an integer column "index" and a double column "feature". I also have a python list parameterwith length the same as number of unique elements in "index" column.

I would like to generate a new column in the following way in PySpark: for each row, if the value of "index" is i, then I would like to multiply "feature" by parameter[i] for this new column.

For small number of elements in parameter, I can use when().when().otherwise to generate the output. How should I do it when the number of elements in parameter is large?

asked Apr 09 '20 at 03:29

DiveIntoML

2,347
2
20
36

What is your spark version? – murtihash Apr 09 '20 at 03:35
1

the spark version is 2.4 – DiveIntoML Apr 09 '20 at 03:36

PySpark: Use dataframe column as index for python list

0 Answers0